Efficient Frequent Item Set Discovery Methods and Improved Apriori
|School||Jiangsu University of Science and Technology|
|Course||Computer Science and Technology|
|Keywords||Frequent itemsets Candidate set Related support Apriori Association rules|
With the rapid development of information technology , data generation and storage has reached a stage of unprecedented prosperity . How to extract potentially useful information from the vast amounts of data poses a severe test to the traditional data processing technology , data mining methods have emerged . Data mining appear around how to improve information , as well as to improve the efficiency of the mining has become the core of the data mining research . The effectiveness of association rule mining one of the primary means of data mining , and how to improve the efficiency of mining association rules and rules has also become one of the hotspots in recent years . In this paper, two aspects of the Apriori association rule mining algorithm analysis and research, that frequent itemsets generated generate association rules , and will have two main aspects can go to improve . Generate candidate sets , there will be a lot of redundant itemsets especially in a two candidate sets and the need to repeatedly scan the database, which is the main bottleneck of Apriori algorithm . Secondly associated rules there will be a lot of redundancy the uninteresting rules bring confusing or even misleading information in the decision - making process of the customer . In view of the above problems , this paper generated in the two frequent itemsets only need to scan the database once , and do not need to generate a lot of two candidates set only two combinations of all possible statistics , the last valve support value directly screened frequently two sets . The introduction of a third variable - support in resolving redundant association rules , two properties related support association rules to eliminate redundant association rules . And because the original use of the nature of the association rules to reduce the redundancy rules should determine its support after the introduction of related support , the view of this while the use of mathematical formulas to export two nature to improve the efficiency of association rules generated , and in the third chapter gives the experimental comparison of the efficiency of the algorithm . Finally, we select the right support , confidence , and related support , the use of efficient association rule mining algorithm Guangdong Industry Technical College site part of the log data mining association rules . And carried out a detailed analysis for the result of the mining , and ultimately improve the site views .