A Research on XML Querying Based on Index
|School||China University of Petroleum|
|Course||Applied Computer Technology|
|Keywords||XML coding schema index query|
As a tool of online data exchange, XML has many excellent characteristics .More and more data is stored and exchanged in the form of XML.On the other hand,traditional database technologies can’t work efficiently owing to the new application. So,New indexing and query technologies specially designed for XML data are needed to cause our attention.On the basis of the deep study and comparison of the XML coding schema,query technologies,indexing technologies,we proposed the advantages and disadvantages of existing technologies about quering and indexing,and at the same time we also give a standard about good index and coding schema.Most existing indexs can express and realize the XML structure-quering through the path expression. One of the core technologies is: Make full use of the coding schema to effectively judge the relationship between two arbitrary nodes of XML documents,such as Ancestor/Descendant、father/son、preceding-sibling/following-sibling、level relationships,etc. Only determining the structure of inter-relations effectively, can we avoid the traversal of all tree and improve the efficiency of the structural join algorithms.In this paper we presented two new coding schemas,the first is a kind of regional coding schema by introducing virtual nodes at the location on the tree leaves.By adding the ID number on the virtual logical leaf nodes, we get ancestor nodes’regional codes.Analysis showed that this coding schema can determine the Ancestor/Descendant、father/son、level location-relationships effectively between two arbitrary nodes of XML documents,an??oid coding update effectively.In order to be able to provide more effective structural information to determine relationships between any two nodes,we proposed another coding schema which is based on prime numbers and the sequence encoding string mode.The new coding schema make full use of the divisible features of prime numbers and matching techniques of sequence string. It can finish all the coding works at constant time, which can determine all kinds location-relationships through one trip pre-order traversal on the XML document tree. The last but not the least,this coding schema support all kinds of coding updates,and the number of affected nodes is very small after inserting or deleting nodes.We use the standard Sharks data to evaluat the efficiency of coding update by comparing PSB encoding schema and Dietz schema which represent regional encoding schemas. The affected nodes of PSB schema is less than that of Dietz schema after inserting or deleting nodes.Therefore, PSB coding schema is a dynamic coding schema which supportting documents’coding updating.Coding schema reflects all objects’location-relationships in the XML tree structure ,but the path index describes the path structure information of source data. The traditional XML query method of path expression processing is: the first implementation method is to speed up the efficiency of XML path expression query calculation by setting up the path index of XML documents; Another method is to encode the XML document tree node, and convert the calculation of XML path expression into the structure-connected computing. In order to improve the efficiency of quering and updating,we combine the two methods: path index and coding schema. The specific implementation plan is that we do not use traditional method to extract path index information from XML documents,but only take advantage of available DTD informations or simple method to extract chief path information with DTD simplified rules. In this way,we get a smaller index structure which can be take as the path index of the document.The next step,we encode XML document tree and the simple DTD index structure with the new coding schema—virtual nodes regional coding schema.Before we quering index structure of XML documents,we execute the query in simplified DTD schema . Through this ste?? can determine the legality of the path expression query.Then,only the legitimate inquiries can continue to execute next XML index query, Otherwise, we terminate the inquiry in a timely and give illegal query tips. Because this index program can control illegal inquiries at a smaller scale -- simplify the DTD inquiry stage, and take advantage of the merits of two methods: the path indexing and coding schema, so this program can achieve a higher efficiency. We compare the query efficiency between SpinX and our schema. Through experiments, we can see that our program’s cost is slightly higher than SpinX in encoding, indexing stage,but we can get a ideal efficiency accompanied by the increase in the size of query data.