Research on Frequent Itemsets Mining Algorithm Based on FP-Tree
|School||Xi'an University of Electronic Science and Technology|
|Course||Computer Software and Theory|
|Keywords||Data Mining Frequent Itemsets Mining FP-Tree FP-Growth|
Data mining has been widely studied and applied in recent years, and frequ-ent itemsets mining is a key step in some problems of data mining, such as ass-ociation rules mining and sequential patterns mining etc, so its study has great t-heoretical and practical value.The main work in this paper focuses on the improvement of FP-Growth witch is aclassical algorithm in frequent itemsets mining. Note that the recursive mining processrequires a large number of conditional FP-Trees to save the projection information inthe FP-Growth, and this leads to an excessive consumption of memory. The reason isthat the sequence of processing items of original FP-Growth algorithm coincides withthat of its projecting directions, that is, forward processing is corresponding to forwardprojection. Thus, if the original FP-Tree is directly used to save the projectioninformation, the information to be used later will be overwitten, resulting in loss of theinformation. To overcome the drawbacks of FP-Growth, the conditional FP-Trees areused to save the projection information first. Then, the order of the processing items ororder of the projection directions of items in FP-Growth is adjusted, so that the originalFP-Tree can store projection information directly, and all the following work can besuccessfully done without wasting memory based on the original FP-Tree. Therefore,any conditional FP-Tree is no longer needed. Based on all above, an improvedalgorithm called NCTree-Growth is proposed, and its two different implementationschemes: backward processing with forward projectio and forward processing withbackward projection, are presented in details. Moreover, their pseudo-codes are givenrespectively, and their comparison is made. Finally, NCTree-Growth algorithm isfurther improved by fullymaking use of the existing results. As a result, the improvedalgorithm can reduce the times of projection and improve the efficiency.The simulation results indicate that the memory consumption of the NCTree-Growth is less than that of FP-Growth, and its performance is better than that o-f FP-Growth.