Research on Automatic Search Engine Performance Evaluation Based on Clustering Analysis
|School||Jiangxi Normal University|
|Course||Computer Science and Technology|
|Keywords||information retrieval Search Engine performance evaluation clustering analysis|
Along with the quick development of the Internet, the information on the Internet increases exponentially everyday. It is no doubt that finding the information we need from a large number of pages is difficult as finding the needle in the haystack. So the search engine technology is the tool to help people quickly find the information. As an important issue in Web search engine researches, we need to consider the objective and reliable way in the content and design of performance evaluation.Traditional search engine evaluation methods need manual annotation of correct answers for a set of queries, which rely on much more human and time-cost. In this paper, we present an automatic search engine performance evaluation method based on clustering analysis. This method includes three steps: first, computing the coverage score of informational type query; second, clustering search results by the coverage score; last, evaluating the retrieval performance using evaluation function. And we analyze the definition of evaluation function by using cluster cohesion and cluster separation. Experimental results show that the automatic method gets a similar evaluation result with traditional assessor-based ones.The main achievements of the thesis are following:1) Base on large-scale log analysis of Web search engine user behavior, we analysis query logs from Sogou laboratories and extract informational queries by automatic query type identification through click information. We also design the method of compute query coverage score.2) We search informational queries by using different search engines (Google, Baidu, Bing), and download and pretreated the returned results which are crawled by web spider.3) We build a complete experimental platform for retrieval system evaluation, cluster search results by the coverage score, evaluate the retrieval performance by evaluation function using cluster cohesion and cluster separation. 4) Last ,we marked correct answers for queries by using the manual sampling method and got evaluation results with the traditional search engine evaluation methods. We also complete and analyze comparative experiments of the two evaluation methods with different functions.