Research on Personalized Search Based on Hybrid Clustering
|School||Guilin University of Electronic Science and Technology|
|Course||Applied Computer Technology|
|Keywords||Individuation Data Mining Density clustering Hierarchical clustering User interest model|
With the computer technology and network technology development, the Internet has developed into a huge information space. How such an ocean of information contains a wealth of data to accurately find the information they need to become more and more scholars to study the contents. The emergence of search engines to provide users with an effective and convenient to retrieve information from the Internet approach, but accompanied by an endless stream of information media and the current user needs become more complex, a search engine that is suitable for all users of the search model can not meet the current needs, personalized search engine emerged in this context. Clustering is a data mining technology is an important branch, clustering algorithm has the characteristics of the development of personalized search engine has a special meaning, this paper analyzes the process of clustering of different clustering algorithms and clustering characteristics of the data, focusing on irregular shape that can identify clusters and cluster density clustering algorithm is simple and efficient hierarchical clustering algorithm, the analysis of personalized search engine on the basis of technical characteristics, designed based on density and hierarchical hybrid clustering algorithm HCPS ( Hybrid Clustering in Personalized Search). Page Rank algorithm to optimize search results play an important role, helping to improve personalized search engine accuracy, we design personalized PageRank-based sorting algorithm PRPS (Personalized Ranking in Personalized Search). HCPS hierarchical clustering algorithm is in the framework that defines the distance between clusters, cluster merging algorithm iterations required to meet the rules and conditions for the introduction of an outlier degree class as a data cluster membership criteria, HCPS in the process of clustering algorithm considered a personalized search engine users interested in the key factors that make the clustering results through the sorted output data closer to the user search intent. PRPS algorithm PageRank algorithm by analyzing the iterative process that combines user interest model and HCPS algorithm clustering results, according to their degree of influence assigned different coefficients recalculated PRank value to replace the original Page-Rank value is based on the PageRank Improved algorithms. PRPS algorithm based on user interest on the web page size and the degree of importance to sort the search results to solve the PageRank topic drift problems, make the sorted data in accordance with the degree of importance of web pages and the similarity of the user's search sorted by size. This paper designs and implements a hierarchical clustering based on density and personalized search engine experimental system, the results of experimental data analysis, HCPS and PRPS accuracy of the search algorithm has achieved good results.