Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems

Research of Frequent Itemsets Mining Algorithms Based on Vertical Data Format

Author ChenShuai
Tutor HuangGuoYan
School Yanshan University
Course Applied Computer Technology
Keywords frequent itemsets frequent closed itemsets vertical data format RI-DAG rowset weight constraint
CLC TP311.13
Type Master's thesis
Year 2012
Downloads 34
Quotes 0
Download Dissertation

Frequent patterns mining is a key technology in the field of data mining.The existing algorithms mining frequent itemsets have high efficiency inmining frequent patterns. However, with the increasing refinement of theapplication fields,datasets with different characteristics appeared. And thesedatasets with different characteristics led the traditional static databasemining techniques to be invalid. Therefore, how to make highly compressedstorage and efficient mining is a research hotspot. What is more, how toadopt reasonable constraint method and mine interesting patterns efficienctlyaccording to different needs is another research hotspot. To handle theseproblems, this paper has mainly focused on the research of new algorithmsfor mining frequent patterns. These new algorithms can be used in sequenceanalysis, customers’ purschasing behavior pattern forecast and softwaresecurity analysis and so on.Firstly, an algorithm based on RIHS-Tree for mining frequent closeditemsets in high dimensional datasets is proposed. It uses vertical data formatto store data and constructs RIHS-Tree according to the transposed dataset.The tree stores the rowsets sharing a common rowid prefix, according tosome predefined order of rowids. It adopts a bottom-up search strategy to traverse the RIHS-Tree. In the process of mining, we use therowsets-inclusion strategy to implement pattern growth and then obtain largerowsets, as well as the corresponding frequent closed itemsets.Secondly, an algorithm based on directed acyclic graph for miningTop-k frequent closed itemsets is proposed. It converses the whole datasetinto a transposed table and construct directed acyclic graph according to therowsets in the transposed table. It adopts a close-checking method to generateall frequent closed itemsets.Finally, an algorithm based on vertical data format for mining weightedfrequent itemsets is presented, which can efficiently mine interestingpatterns which users are satisfied in. Vertical data format is adopted in orderto calculate the support of itemsets and it classifies the transposed dataaccording to minimum items. The classes of minimum items are obtained bycombining minimum items with other items or the combinations of theseitems. The notion of the weighted valid extension (wv) property is proposed.Based on wv and hash table which stores weighted non-frequent2-itemsets,pruning is applied to reduce the candidate itemsets. Finally, all of frequentitemsets with constraint can be mined. The algorithm used in this article is written in JAVA. Synthetic datasetsand real datasets are adopted for mining frequent itemsets in the experiments.

Related Dissertations
More Dissertations