Research on Peer-to-Peer Traffic Identification Algorithm Based on Cluster Analysis
|School||Changsha University of Science and Technology|
|Course||Applied Computer Technology|
|Keywords||Clustering Analysis P2P Traffic Identification P2P Traffic Feature BIRCH Bayesian Information Criterion|
With the development of modern Internet applications, P2P(Peer-to-Peer) had become one of the fastest growing network applications. Because of the advantages of file sharing and distributed computing, P2P technology was widely applied in recent years. P2P traffic had become the major part of network traffic. P2P traffic brought potential safety hazard and network congestion which were caused by resources excessive occupancy. It hindered the development of normal network business. Although P2P gave us convenience, we must consider how to manage P2P traffic in order to guarantee the normal operation of the network. With the rapid development of P2P technology, it adopted various methods such as dynamic port and protocol field encryption. P2P traffic identification would face serious challenges. Because of the development of traffic hiding technology such as port jumping and payload encryption, explicit feature P2P traffic identification methods like port and content had gradually been eliminated. Therefore, P2P traffic effective identification had become the urgent problem.This paper researched P2P traffic identification based on clustering analysis. The main achievements and innovation were as follows.Firstly, according to the background of the P2P traffic identification and significance, the domestic and foreign situation of the research, and a series of problems were caused by the development of P2P technology, this paper researched several typical P2P traffic identification methods and analyzed the characteristics and problems of these identification methods in the process of identifying P2P traffic.This paper deeply analyzed and researched the characteristics of P2P traffic. Through the experiment, the paper chose five feature attributes which could significantly distinguish P2P traffic, and proposed to apply the attribute of download/upload speed ratio to identify P2P traffic. The combination of the five kinds of attributes retained P2P traffic characteristics information as much as possible, at the same time, to a large extent it reduced more redundancy among the attributes. So it could identify P2P traffic more efficiently and accurately.Finally, the paper gave a P2P traffic identification algorithm based on clustering analysis. The algorithm reduced the scope of the clustering problem by making the data set decompose into sub-clusters. The complexity of the I/O processing was reduced. Meanwhile, adding Bayesian Information Criterion to the algorithm, the method could chose the optimum clustering model and thus achieved automatic dividing clusters. Furthermore, it reduced the influence of the human factors in the identification process. Experiments showed that this algorithm had higher accuracy as well as lower rate of erroneous judgement.