The Study of Text Index Construction for Large-Scale Dynamic Collection
|School||Harbin Institute of Technology|
|Course||Computer Science and Technology|
|Keywords||dynamic set of documents the index construction the index merge|
In a dynamic collection environment, the organization of the index needs to make a balance between the search efficiency and effectiveness. In order to improve the efficiency of the index, index of documents should be be stored by block and with a link, but will reduce the efficiency of search and vice versa, so. The design of indexing model often make indexing and retrieval efficiency as important factors. In the design of the traditional indexing model, in order to ensure the priority of retrieval efficiency, real-time updated of index sometimes is put on a secondary position. But with the growing diversity of search application, the real-time retrieva and the maintaince of indexing background is more and more important.In this paper, we put our point on the online index for large-scale collection, In a dynamic collection environment, the search system provide search services as well as the construction of the inverted index: index can be updated soon when the the dynamic collection updated through index maintenance strategy. By experiment and comparison, we designed a model to optimizing index maintaince: while ensuring the efficiency of search, the real-time indexing system has been greatly improved in index construction. In this model, we presents a complete-tree-based approach to building the online index, and use features of complete-tree to control the index merge in order to reduce the combined cost, and the indexing and retrieval performance can be adjusted by the need. Compared with previous methods, the model has higher performance and better scalability, and has an excellent performance in our experiment.With the system based on this model, we make some experiment in the balance of efficiency of indexing and retrieval and query performance prediction. And get some concolusion in the effiency of indexing and retrieval– two key factors in the index of dynamic collection. With the query performance prediction mechanism we present, pre-retrieval predictors, in particular, it provides an expansion for the system in the research of retrival model.