The Research of Incremental Update Association Rule Mining Methods
|School||Harbin Engineering University|
|Course||Applied Computer Technology|
|Keywords||association rules frequent item-set temporal constraint fuzzy theory graph|
Today’s society is an information society, the information is changing rapidly. Whenhuge information brings conveniences to people, it also brings many problems: The first isthat the information is so excessive that it is difficult to digest; the second is that it isdifficult to distinguish between true and false from the information; the third is that it isdifficult to ensure the security of the information; the forth is that it is difficult to deal withthe information because of their forms are not uniform. How can we not be overwhelmed byinformation, but from which to discover useful knowledge and improve informationutilization? Faced with this challenge, data mining techniques have emerged. Associationrule is a very important branch of data mining, it can find the relationship between things,and then people can get the potential valuable information among data. The database willalways continue to change over time, so how to efficiently update the already foundassociation rules from the updated database becomes the focus of people’s inquiry.The existing mature incremental update association rule mining methods can bedivided into two categories: one based on Apriori algorithm, such as FUP1, FUP2and so on,and the other based on FP tree algorithm, such as FIUA2and so on. These two kinds ofalgorithms have their own shortcomings respectively, the defect of the former is that theyneed to frequently scan the database, time-consuming, the defect of the latter is that theyneed to generate conditional FP tree many times, space-consuming. This paper summarizesthe advantages and disadvantages of the previous algorithms and gives association ruleincremental update mining algorithms based on graph, the algorithms fully consider theneeds of mining, scan the database only once, and reduce the generation of candidate set, bydoing these they not only improve the space utilization but also improve the efficiency ofmining. The work of this paper is as following:Firstly, we have an in-depth research on the existed classical algorithms and theirimproved algorithms, including Apriori, FP tree, FUP, DLG and so on; we analyze theadvantages and disadvantages of these algorithms. We also have a discussion and learningon the novel techniques of recent algorithms, such as the processing of the numericaldataset, the concept of fuzzy constraints and so on. Secondly, the paper gives the four-fork chain graph storage structure, analyzes theadvantages of the introduced structure, gives the entirely frequent item-set mining algorithmGIU1and maximal frequent item-set mining algorithm GIU2, and also gives the algorithmdescriptions and instance demonstrations.Additionally, because of the advantages of graph, the Graph structure is extended tothe application of fuzzy temporal dataset incremental update mining, the paper gives fuzzytemporal incremental update entirely frequent item-set mining algorithm FuzzyGIU basedon the structure, analyzes the rationality and effectiveness of the algorithm, and also givesthe algorithm description and instance demonstration.Finally, we do experiments on these algorithms and have some performancecomparisons with the associated existed algorithms. The results show that these algorithmsbased on the gived storage structure of graph have better performances than the existedalgorithms when the dataset size and minimum support change respectively, thus theefficiencies of these algorithms are verified.