Web Usage Mining and the Research of Personalized Recommendation
|School||Zhejiang University of Technology|
|Course||Applied Computer Technology|
|Keywords||Data Mining Web usage mining Personalized recommendation Apriori algorithm K-means algorithm|
Data mining is computer science, artificial intelligence and database research direction is an important issue , it is from a large , incomplete , noisy , fuzzy , the practical application of random data , extracting implicit in them , people not known in advance , but is potentially useful information and knowledge. Web pages contain complex , unstructured , dynamic data, how vast amounts of information on the Web to analyze , for the user's needs, providing personalized recommendation service and is today an important data mining application . This paper summarizes the results of previous studies based on Web usage mining carried out for the research, the main contents are summarized as follows : ( a ) on the basic theoretical knowledge of data mining and classification for the overall study , a detailed analysis of the data source for Web usage mining , the basic process of data preprocessing . ( 2 ) on the association rules related theories in detail, analyzing the classical Apriori algorithm performance, it has been improved. Candidate sets generated in the natural connection before first conduct a pruning process, reducing the number of participating itemsets connections , thus reducing the size of candidate itemsets generated , reducing the number of loop iterations and run time , while the connection determining step to reduce excess judgment times. ( 3 ) a detailed description of the K-means clustering algorithm is the basic idea and process , analyze its strengths and weaknesses , we propose a modified K-means algorithm, that MFA algorithm . For K-means clustering algorithm, the center determined after each adjustment new cluster center requires a lot of distance calculations , proposed a change in the use of cluster centers information to determine the center of the new cluster approach , through centralization selected from the dynamic cluster candidate set filtering method reduces the computational complexity. ( 4 ) on the campus Web site log data analysis and processing , the use of improved mining algorithms for data mining , find the user's access patterns , and finally the use of mining results , to the site to add personalized recommendation feature , the initiative may be of interest for users recommend their information .