Research and Realization of Frequent Travel Pattern Discovery Algorithm for Mass Travel Data
|School||Beijing Jiaotong University|
|Course||Computer Science and Technology|
|Keywords||mass data frequent pattern maximum frequent itemsets data partition transaction|
As time advances, information storage has grown faster and faster in size. Thus, it is possible to mining and analysis mass travel data accumulated for discovery of passenger travel behavior. In the areas of data mining, we just to find frequent pattern mining on travel data can be used to discover the hidden travel behavior rules under the passenger data. Because of the good characteristics of maximum frequent itemsets, this article focuses on the massive passenger travel data for research and implementation of algorithm for maximum frequent itemsets discovery. The main works of this article are as follows:First of all, this article illustrates the various classical algorithms realization for discovering frequent patterns, research on these algorithms of process, and discusses the advantages and limitations of the algorithms, such as Apriori, FP-growth, FPmax*. Then, this article describes the characteristics of mass passenger dataset, such as highly sparse massive data, huge base, and relatively short itemset and model. And such dataset need very low support for frequent pattern mining.This paper proposes a kind of concept of composite itemset structures. It is suitable for frequent itemset mining on mass travel dataset. Members in it are different in granularity, hierarchy and logical concept except physical realization. This paper defines the community level frequent pattern discovery too.This article focuses on this concept of TDP (Transaction-based Data Partition) and TDPFP (Transaction-based Data Partition Frequent Pattern Discovery). It theoretically proved the good nature of TDPFP:After using TDP, each sub result dataset of each sub dataset merges into result dataset directly each other. And then the result dataset does a small amount of redundant testing without having to do a lot of superset or subset checking work. From the experiment we can see that TDPFP has good spatial and temporal characteristics.Finally, the paper has improved the TDPFP. It presents and implements the incremental updating algorithm for frequent patterns discovery, TDPIU, based on TDPFP. Experiments prove that it can be solved to incremental mining of dataset efficiently.