Research on Multidimensional Mining and Identification of P2P Flow
|School||Huazhong University of Science and Technology|
|Course||Computer System Architecture|
|Keywords||Stream Mining P2P flow identification Multidimensional clustering IP entropy|
Streaming media distribution network based applications emergent lead to exponential growth in network traffic, such as P2P streaming, and accompanied by DDOS (Distributed Denial of Service) attacks, worms and other traffic inclusions which is stable and normal operation of the network serious threat. Therefore, in-depth analysis of the composition of network traffic, to grasp the nature of the network traffic, proportion and change, and take appropriate measures to become the primary task of the current network management. The the network flow srcIP, dstIP, Protocol, srcPort and dstPort quintuple hierarchical clustering method called multidimensional clustering of network flow. Improvement in the basis of an analysis the original multidimensional clustering algorithms and multidimensional clustering level of the tree structure of the original multi-dimensional clustering algorithm. Using the first triplet clustering of accordance with the Protocol, srcPort and dstPort the excavated triples rules, single-dimensional clustering results of the then srcIP and dstIP bind drawn significant quintuple rules way to complete the multi-dimensional clustering. Which are two new methods to deal with the unique diamond structure of multidimensional clustering tree to avoid repeat the derived and repeat match operation are: top-down and bottom-up way to construct multidimensional clustering tree and direct limited duplicate nodes only in a branch derived. Both reduce the length of each match multidimensional rules NetFlow table to reduce the need to use the NetFlow table to match the number of multidimensional rules, thereby improving the efficiency of the original multi-dimensional clustering algorithm. Network flow multidimensional clustering results, according to the the of each multidimensional the rules of srcIP and dstIP distribution, define the IP entropy, IP entropy to describe the degree of dispersion of the distribution of srcIP and dstIP. The combined IP entropy definition of bidirectional IP prefix and P2P streaming indicators sp2p identify P2P streaming. The the size of the calculated each multidimensional the rules of srcIP and dstIP sp2p value, to determine whether this multidimensional rules for P2P streaming. Finally, the WAN and LAN NetFlow data on system performance and functional testing. Experimental results show that: the improved multidimensional clustering algorithm to reduce the time complexity of the original multi-dimensional clustering algorithm; mining multidimensional flow at the same time, can constitute a clear understanding of the current network traffic; Moreover, the system is able to identify the accounting network a larger proportion of the total flow of a variety of P2P streaming, BitTorrent, PPLive.