Research on Techniques of Collecting Internet Traffic Ground Truth
|School||Shanghai Jiaotong University|
|Course||Applied Computer Technology|
|Keywords||Traffic classification DPI DFI Machine Learning Heuristic rules|
How to accurately and efficiently identified on the basis of the application of the Internet data stream classification , up to helping government and Internet service operators , small complete cost-effective way to manage network traffic to the network administrator and network data analysts , maintain network security Internet an important research topic in the field of data analysis and the huge challenge . Traditional port-based and DPI - based classification at a distinct disadvantage in the the P2P business types dramatic increase in poor as well as the anti- encryption problem , leading to a sharp decline in classification coverage ; classification based the DFI or flow topology itself need pure baseline flow as learning input. View of the DPI In addition to the manual identification is the most versatile and reliable benchmark classification method , the article focuses on the study and discussion of how to improve the DPI classification coverage of issues : DFI module introduced in the original DPI system , the implementation of supervision within the system machine learning classification algorithm , the original the DPI module can not be classified unknown traffic to the implementation of secondary classification and recognition . Experimental results show that the new mixed - flow classification model to some extent solve the DPI classification system itself is anti- poor encryption protocol scalability weak , which led to the classification of low coverage and difficult to maintain . Combines DFI classification , the principles and limitations of the algorithm itself , will introduce the problem of declining application classification granularity Therefore, innovative DPI benchmark classification model based on heuristic traffic identification algorithm : With record data flow topology information further implementation of the different levels of heuristic traffic classified before the actual DPI resolution process , the unknown application data stream quickly classified as a known application protocol . Finally, the the implementation lightweight heuristic DPI classification method can effectively improve the classification efficiency and classification coverage indicators , while maintaining the application of classification granularity combined with Wireshark software . The results of this research and expand the the flow benchmark classification technology research ideas , has some theoretical significance and application value .