Study of Full-text Retrieval Technology for Waterway Foundation Database Platform Based on Lucene
|School||Dalian Maritime University|
|Course||Management Science and Engineering|
|Keywords||Waterway foundation database platform Full-text retrieval based onLucene Chinese word segmentation Multi-file format text parsing Global crossdatabase|
It is an informatization and networking era in the21st century, people can always get the huge benefits of working and making friends or other activities without getting out of their homes. But with expansion in the amount of information and data, when we faced the massive data, how to get the useful data quickly is the question urgent to be resolved in informatization data management area. The main measure to get the meaningful data from the massive data is to use information retrieval techniques, and it has a great advantage to use the full-text retrieve technique in the retrieval of massive data. Firstly, to create the index for retrieved data; and then, to search in the index to get the data, In this thesis, in combination with the requirement of the system, and studying on the full-text retrieval and the lucene full-text retrieval framework, I design and implement a full-text retrieval system for waterway foundation database platform based on Lucene, the user can enter the key words in the search page and then get the full-text data resources.In this thesis, full-text search technology relying the background is the waterway foundation database platform, and it has the following characteristics:1, It has the professional of waterway transportation;2, It has a variety of data sources (Word documents, PDF documents, Excel documents, database records data, etc.);3, The data sources are the Chinese office documentation and database records. In this thesis, carrying out a deeply research on the open-source full-text search architecture Lucene based on Java, depending on the characteristics of the application system platform based on full-text retrieval technology, I expend the functions of the Lucene full-text search framework. For example, Lucene’s framework has two Chinese text-parser, but they can’t parse and filter the Chinese words in an effective way, so for coming with the needs of full-text search of the waterway data, the thesis improve Lucene’s Chinese parser. Another example, Lucene can only process the text data, and data to be indexed has the document data in a variety of formats, so this study design a processing interface for variety of popular formats of document data, it can be a good solution for multi-format document indexing problems. Another example, for the waterway infrastructure platform has a characteristic of multi-database, so this thesis design a global cross database data retrieval module to integrate multi-database data, to realize the one point data retrieval, this design can greatly reduce the complexity for the user to search data.With the combination of the characteristics of the waterway infrastructure platform, this thesis do a deeply study on the key technologies of the Lucene full-text search, to make a detailed analysis and design, and make it real.