Research on Query Expansion of Information Retrieval
|School||Guangxi Normal University|
|Course||Computer Software and Theory|
|Keywords||Information retrieval Query Expansion Retrieval Model Recall Precision|
With the rapid development of Internet technology, information on the network showed explosive growth, development of network technology broadens our access to information, but these vast amounts of information in our daily lives to bring great convenience, but also to our a great deal of distress, people flooded in the face of this mass of information, when at a loss, the amount of information overload into a dilemma. How to vast amounts of information from these users retrieve the information they need field of information retrieval has become a very important research topic. Thus, the search engine came into being, and as submitted by the user query and document information and expression does not match the existence of such phenomena is not complete, the traditional information retrieval can not meet the requirements of the user's query. To solve this problem, some scholars have proposed query expansion technique, through certain methods and strategies for the user's initial query words to expand and reconstruction, so as to achieve the purpose of improving the retrieval performance. Information retrieval query expansion is an effective method to optimize the query, research information retrieval query expansion technique has important theoretical and practical significance. The main research work is as follows: First, the article describes the research background, purpose and significance of the information retrieval and query expansion gave a brief overview. Then introduced some relevant information retrieval theory and current knowledge and several traditional query expansion technique to perform detailed analysis and presentation for this research work provides a theoretical basis. Secondly, the comparative study of three traditional information retrieval model retrieval performance, including Boolean model, vector space model and probabilistic models. By analyzing several models retrieval principles and retrieval performance, compare the advantages and disadvantages of these models, and on this basis, the traditional vector space model was proposed to improve the structure of web pages based on improved vector space model, which html-based language structure information the web text document content into the title type, bold type, text category 3, according to the different blocks in the document location, and to document the importance of the type, for each block given different weights scale factor, on which the weight of the lexical items readjusted to better distinguish between relevant and irrelevant documents documents, thereby improving the detection performance of retrieval. And further, combining the different characteristics of query expansion methods and their advantages and disadvantages retrieval model, the previous text, based on the proposed use of the previous chapter improved vector space model, we propose a web-based user query page structure and behavior of the pseudo relevance feedback query expansion algorithm proposed in this paper based on the use of web page structure improved vector space model without changing the user's query behavior in the case, combined with the results of the initial inspection of the user browsing behavior queries to extract relevant documents, then the initial query expansion. Experimental results show that the model than the traditional tf-idf algorithm, based on local context analysis algorithm and query expansion algorithms based on Apriori algorithm for local feedback query expansion retrieval performance has significantly improved.