Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems

Efficient Frequent Item Set Discovery Methods and Improved Apriori

Author ChangShaoChun
Tutor WuBin
School Jiangsu University of Science and Technology
Course Computer Science and Technology
Keywords Frequent itemsets Candidate set Related support Apriori Association rules
CLC TP311.13
Type Master's thesis
Year 2011
Downloads 53
Quotes 3
Download Dissertation

With the rapid development of information technology , data generation and storage has reached a stage of unprecedented prosperity . How to extract potentially useful information from the vast amounts of data poses a severe test to the traditional data processing technology , data mining methods have emerged . Data mining appear around how to improve information , as well as to improve the efficiency of the mining has become the core of the data mining research . The effectiveness of association rule mining one of the primary means of data mining , and how to improve the efficiency of mining association rules and rules has also become one of the hotspots in recent years . In this paper, two aspects of the Apriori association rule mining algorithm analysis and research, that frequent itemsets generated generate association rules , and will have two main aspects can go to improve . Generate candidate sets , there will be a lot of redundant itemsets especially in a two candidate sets and the need to repeatedly scan the database, which is the main bottleneck of Apriori algorithm . Secondly associated rules there will be a lot of redundancy the uninteresting rules bring confusing or even misleading information in the decision - making process of the customer . In view of the above problems , this paper generated in the two frequent itemsets only need to scan the database once , and do not need to generate a lot of two candidates set only two combinations of all possible statistics , the last valve support value directly screened frequently two sets . The introduction of a third variable - support in resolving redundant association rules , two properties related support association rules to eliminate redundant association rules . And because the original use of the nature of the association rules to reduce the redundancy rules should determine its support after the introduction of related support , the view of this while the use of mathematical formulas to export two nature to improve the efficiency of association rules generated , and in the third chapter gives the experimental comparison of the efficiency of the algorithm . Finally, we select the right support , confidence , and related support , the use of efficient association rule mining algorithm Guangdong Industry Technical College site part of the log data mining association rules . And carried out a detailed analysis for the result of the mining , and ultimately improve the site views .

Related Dissertations
More Dissertations