Research on Parallel Frequent Graph Pattern Mining
|School||Harbin Institute of Technology|
|Course||Computer Science and Technology|
|Keywords||frequent subgraph pattern mining frequent closed graph pattern mining parallel algorithm dynamic load balancing|
As a general data structure, graphs have become increasing important in modeling sophisticated structures and their interactions, with broad applications including chemical informatics, bioinformatics, computer vision, video indexing, text retrieval, and Web analysis. Mining frequent subgraph patterns for further characterization, discrimination, classification, and cluster analysis becomes an important task.Among the various kinds of graph patterns, frequent substructures are the vary basic patterns that can be discovered in a collection of graph. They are useful for characterizing graph sets, discriminating different groups of graphs, classifying and clustering graph, building graph indices, and facilitating similarity search in graph databases. Recent studies have developed several graph mining methods and applied them to the discovery of interesting patterns in various applications. For example, there have been reports on the discovery of active chemical structures in HIV-screening datasets by contrasting the support of frequent graphs between different classes. However, the frequent graph mining algorithms can’t achieve good performance when the minimum support is very low. We present the parallel graph mining algorithm base on cluster parallel environment.The main results are follows:Mining frequent graph patterns plays an important role in data mining. Base on the spirit of algorithm gSpan, an algorithm of parallel frequent graph pattern mining using dynamic load balance strategy on cluster is proposed in this paper. The algorithm effectively implements parallel frequent graph pattern mining by splitting the DFS lexicographic tree, maintaining the overload queue, and restricting the granularity. The theoretical analysis and experiment results show that our method on parallel frequent graph pattern mining improved the performance remarkably.According to the algorithm CloseGraph which mining frequent closed graph pattern, we give a method which substitute detecting failure of early termination. We implement the CloseGraph, and give the parallel frequent closed graph mining algorithm. The experiment results prove the analysis of the algorithm, this algorithm has good performance.