Cost-based XPath Query Optimization
|School||South China University of Technology|
|Keywords||XML XPath query optimization cost estimation|
With the developing of the Web technology, especially the developing of the mobile Webtechnology, the information and data for people to share and exchange become more and morelarge and complex. The traditional relational database is facing challenge. XML is a kind ofself-descripted markup language which is exchangable and enabled to describe data via treestyle or graph style structure. The advantage can be summarized as independence from systemor platform, convenience to describe the semanteme of data and with ability to describestructural, semi-structural and non-structural data. XML has been a de facto standard of datastorage and exchange on the Internet.There are two major methods to manage XML data via database technology. One isXML-enabled database which utilizes or extends the traditional relational or object-orienteddatabases to handle XML data. The advantage is to use the matural relation databasetechnology completely, but it can not handle XML data with its own characteristic, and it willbring in a lower efficience since the multiple transformations during the query process. Theother method is native XML database which is tailor made for XML data. It can conquer thedisadvantage of the former one.For the characteristic of the hierarchical structure of XML data, most technologise suchas query optimization, storage and transaction management that have been matural in the fieldof traditional database can not be immigrated to XML database. Therefore, native XMLdatabase is one of the popular research topics in the database field. Query process is one ofthe most important function of modern database system, and query optimization technologyhas a critical influence on the performance of the query processing. XPath is the basis ofalmost all XML data query language, so the processing effience of XPath path expression isso important, and this is the reason for this paper to research the topic.In this paper, we will firstly review and analyze the basic concept and related technologyin the field of XML database. Then introduce a statistic method of XML document valueinformation with the combining of the statistic of XML document structure information. Inaddition, XPath statement cost model based on the document statistics information is addressed, and a query optimization method based on that cost model will be given. Further,we will bring in a heuristics rules to improve the query optimization method to lower the timecomplexity of algorithm. Finally, we implement the method in XSQS system and performesome test on it. According to the result of experiments, the algorithm can optimize the queryperformance of XPath statement efficiently.