Dissertation
Dissertation > Industrial Technology > Automation technology,computer technology > Computing technology,computer technology > Computer software > Program design,software engineering > Programming > Database theory and systems

Based on the maximum frequent set data mining association rules algorithm

Author SongWeiLin
Tutor XuHuiMin
School Beijing University of Posts and Telecommunications
Course Circuits and Systems
Keywords Data mining KDD(Knowledge Discovery in Databases) Association rules sequence pattern the DMFIA algorithm the ISS_DM algorithm maximum frequent item sets maximum frequent item sequence sets maximum frequent customer sequence sets
CLC TP311.13
Type PhD thesis
Year 2006
Downloads 935
Quotes 6
Download Dissertation

Data mining is a technique that aims to analyze and understand large source data and reveal knowledge hidden in the data. It has been viewed as an important evolution in information processing. During the past decade or over, the concepts and techniques on data mining have been presented, and some of them have been discussed in higher levels for the last few years. Like the other new techniques, however, data mining must develop gradually from concept creation, accepted importance, wide discussion, few usage attempts to a large applications. Most experts consider it as the phase of wide discussion today. It still needs theoretic studies and algorithm exploring.Association rule mining is an important branch of data mining that it has obtained many valuable results but there still are a deal of more challenging problems to discuss.For large databases, the research on improving the mining performance and precision is necessary, so many focuses of today on association rule mining are about new mining theories, algorithms and improvement to old methods.In this paper, the main researches involve the actuality and the trend of development of data mining technology and association rules. On base of maximum frequent item sets of association rules,the paper deploy the correlative work.The paper use for reference the correlative idea of the DMFIA algorithm for mining of maximal frequent item sets based on FP-tree. and put forward a new maximum frequent itemsets algorithm based on customer database by using different analysis method of data and adjusting the minimal support number neatly. The new algorithm can analyse data in different manner and reduce the time of execution of the algorithm for mining vast datum validly. Obviously, the new algorithm can improve the mining efficiency and satisfy many requirements of users. Through further analysis, the DMFIA algorithm and the above new algorithm can not solve the problem of data mining about customer sequence view database validly. The paper use for reference the correlative idea of the above algorithms, and put forword another new algorithm combining with sequential patterns (the item level maximum frequent sequence sets algorithm based on sequential patterns). The item which of the support number is not less than the minimal support number (s) start to operate circularly. The taxis of the items is arranged by the support number which changes from small to large. If the element of MFCS_d containes the items of transactions which of the support number is not less than the support number of a frequent item operating circularly, then the elements are picked up to form MFCSk. The support number of the element of MFCSk (flag) is worked out in backup table of MFCS. If flag>=s’ (usually s=s’), then the element (customer sequence sets) is outputted to maximum frequent sequence sets MFS_d. If the condition is not satisfied, the transactions of customer sequence sets are assembled reciprocally to create concourse, the element of concourse is picked up to operate circularly.The execution time of the item level maximum frequent sequence sets algorithm based on sequential patterns is decided when MFCS_d is empty. On the base of the item level maximum frequent sequence sets algorithm based on sequential patterns, the paper put forword the transaction level maximum frequent sequence sets algorithm based on sequential patterns.The transactions of per customer sequence sets which of the support number is not less than the minimal support number (s) start to operate circularly. The taxis of the transactions is arranged by the support number which changes from small to large. The data is picked up to operate circularly the same as that of the item level maximum frequent sequence sets algorithm based on sequential patterns by and large. On the other hand, the paper then describes the ISS_DM algorithm for mining of maximal frequent item sequence sets. Because the algorithm can not solve the problem of data mining about customer sequence view database validly, the paper put forword the improved ISS_DM algorithm combining with sequential patterns. The algorithms were validated accordingly. It shows that the execution time of the improved algorithm is reduced and efficiency is good when both algorithms are applied to mine the same datum. In the end, The paper describes the problem of data mining for multi-dimension model of data warehouse. The item level maximum frequent sequence sets algorithm based on sequential patterns and the improved ISS_DM algorithm are combined with multi-dimension model of data warehouse accordingly. The paper put forword the item level maximum frequent sequence sets algorithm based on sequential patterns and the improved ISS_DM algorithm on base of multi-dimension model of data warehouse.In conclusion, Through the study of maximum frequent item sequence sets of the DMFIA algorithm and the ISS_DM algorithm, the paper put forword a series of new algorithms. The results of experimentation validate the validity and practicability of the new algorithms. It shows the better creativity and value of theory of the new algorithms. In the same time, the new algorithms possess better application foreground in the efficiency of data mining and the usability of mining large-scale database.

Related Dissertations
More Dissertations