Research on XML Enabled File System
|School||University of Science and Technology of China|
|Course||Computer Software and Theory|
|Keywords||File system XML XML-Enabled Database Semantic File System structure-based XML storage Servant/Exe-Flow model Minicore OS|
With rapid development of information techniques, performance of computing, volume of digital data to be stored and complexity of information structures all increase. Under the circumstance, more capabilities of information organization and management are needed. File Systems, as one important component in Operating Systems, offer data storage and management. But there are two insufficiencies in current file systems, that is, (1) less capability of managing huge volume of data, (2) no support on file semantic structure, which leads Operating Systems to lack fine-grained information control.On another hand, XML emerges under the information explosion. Its good self descriptive and cross platform characteristics make it a standard for information storage and exchange on internet. How to store XML data efficiently has been a research focus.Firstly, this thesis analyzes several typical file systems including FAT, EXT2, SemanticFS, NTFS, WinFS and so on, summarizes their main design ideas and logical architecture. Further research shows that traditional file systems can not manage the great amount of complex, structural and semi-structural data. The key reason for above exists in the fact that traditional file systems regard data as bit stream without semantics, but use special applications to operate the inner structures and contents of files which makes the control of file on a bad grain level. However, with the high request of information process, OS needs not only strong query and search ability but also control information on a fine grain level., thus it brings in a problem for OS to strengthen the ability of structural and semantic manipulation.After that, this thesis analyzes XML and its related techniques, gives some abstractions and summaries of XML characteristics, based on which several XML storage methods such as file system based, XML-enabled database, native XML database and. object manager based methods are analyzed in detail. This thesis focuses on the file system method. Traditional method on file system to store XML regards XML data as flat document file. In fact, XML document is serialized data stream, which is often in the form of flat file. It means that storing each XML document in a document file, on high level, implements a query engine. When the query is excuting, XML file is parsed into a tree in memory. The advantage of file system based XML storage method is it is quite easy to be implemented and needs no extra database or other manager for help. However, its disadvantage is also obvious: 1)Before access, XML document has to be parsed again;2)During the whole process of query, the file that is under parsing has to stay in memory;3)In order to solve some of the problems above, extra index can be used, but when XML document is updated, index is hard be maintained.Based on the research above, this thesis combines the ideas and techniques of database, SemanticFS and XML characteristics, through modifying traditional logical layout of traditional file systems, storing semantic structures of XML file, proposes a new XML enabled storage method on file system. This method takes advantage of the similarity of directory/file hiberachy model and XML data model, connect them seamlessly, regards XML element node as directory inode and regard XML attribute node as file inode which is implemented efficiently on Linux Ext2 file system. Under the storage method, some further problems are stated and a new XML enabled file system is proposed by solving the problems. The system architecture is described, the organization and relationships of different modules are also given, and at last concrete implementations for each part are listed.The research has been applied on a Minicore OS based on Servant/Execution Flow OS model.This thesis proposes the design and implementation for the XML enbabled file system on Minicore which proves the validity, good performance and extra functions of the research.In summary, the new contributions of this thesis include:1. A new Semantics based XML data storage method on file system is proposed, which can make file system manipulate file in depth and provide direct XML storage, extra query function.2. A new XML enabled file system construction model is presented by adding XML analyzer, index engine, query engine and object manager. As for each part of XML Enabled file system architecture, different implementation techniques are given which includes data structures, algorithms and interfaces on Linux. This method makes OS provide good XML data management and new functions.3. A new implementation method for XML data based on Directory/File index nodes on traditional file system is designed, which takes advantage of similarity of file system’s dir/file structure and XML data structures, connect XML to traditional file system. Regard element node as directory index node while attribute node as file index node, thus file system needs no great changes but can manipulate XML data in fine grain.4. The research is applied to MiniCore OS which is based on Servant/Execution-Flow model, the design and implementation are given for the XML enabled file system on MiniCore and validity and performance are tested and proved.