Research on the Dataset of the Internet Traffic with Accurate Application Information
|Course||Applied Computer Technology|
|Keywords||designing Socket Hook NDIS collecting the network traffic marking thedataset|
With the continuous improvement of the network technology, the increasing of thenetwork application and the fast explosion of the internet traffic, accurate traffic classificationbecomes more and more important for network managements including deploying QoS-awaremechanisms, bandwidth budget managing, intrusion detection et al. However, the traditionalclassification method is becoming more and more ineffective, owing to the application ofdynamic port and encryption technology in the current network environment.Therefore in thepast few years, much research work has been focused on machine learning based techniqueswhich focus on the macro-pattern of internet traffics instead of their micro-feature. Most oftraffic identification technologies based on machine learning use these traffic samples, whichwere collected in the backbone of the network. But these traffic samples do not have thelabeling with accurate application information. Therefore, it becomes a difficult problem inthe field of traffic classification that how to gain the dataset of the network traffic withaccurate application information. The traffic trace with accurate application information canprovide the training dataset and testing dataset for the traffic classification based on machinelearning.In order to solve the question about obtaining the network traffic trace with accurateapplication label, the scheme is proposed to making the dataset of the network traffic withaccurate label. Then, the scheme will be deployment in the real environment of the network,and the network traffic with real application information will be captured.Firstly, the socket hook and the passthru based network driver interface specification(NDIS) are developed and installed the user host with windows system. The socket hookcaptures all network applications which are called by the socket and gets the relationship offive-tupes (source ip address, source port, destination ip address, destination port and protocol)and the application ID in the application layer. In the kernel layer, the Intermedium Driver ofNDIS based on passthru is used to gain the network traffic and obtain the relationship offive-tupes and the application ID. Then, the passthru driver marks the TOS of the IP packetwith the application ID according to the five-tupes information. The marked networkapplications will be transferred after calculating the checksum. Secondly, the traffic collector based on the field programmable gate array (FPGA)captures the network traffic at the backbone node of the network, according the routing mirror.Then, the traffic collector will filter the captured network traffic and send them to the dataserver for storage and processing according to the value of the TOS.Finally, the collected network traffics are gathered to flow according to the five-tupesand the value of TOS from the packet. In the next, the gathered flow will be filtered andencrypted. Then, the processed network traffic is used to mark the network traffic trace withaccurate application labeling.