Font Size: a A A

Instant Messaging Traffic Identification Based On Machine Learning

Posted on:2020-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:X N HaoFull Text:PDF
GTID:2428330602952248Subject:Information security
Abstract/Summary:PDF Full Text Request
With the construction of network information,the number of network users and the scale of network applications continue to expand.At the same time,network traffic is expanding,which is full of various insecurities,and makes information security difficult to guarantee.How to accurately identify the traffic to achieve the regulatory role of the network has become the focus of current network security research.Instant messaging has the characteristics of high real-time and low cost.Its appearance has changed the way people communicate with others,and its traffic accounts for a large part of network traffic.Different instant messaging softwares generally use proprietary protocols and encryption protocols to transmit data due to their own functions and performance requirements to ensure data transmission security.Therefore,in order to achieve effective network management,public opinion monitoring,national defense security,etc.It is necessary to effectively identify the instant communication traffic.The traditional network traffic identification methods mainly include: port-based identification method,recognition method based on deep packet detection,and recognition method based on user behavior characteristics.With the rise of artificial intelligence,machine learning has also been applied to the identification of traffic.Due to the use of port multiplexing technology,the port-based identification method is gradually invalidated.the method based on user behavior characteristics has a limited application type and limited application scope.the more commonly used method is based on deep packet inspection,but for encrypted traffic or When network the scale is large,the technology cannot meet the requirements of real-time and accuracy.The updating of network communication technology and the various application softwares have caused the traditional detection methods to no longer play a good recognition role.However,the machine learning based method can describe the complex behavior and reduce the difficulty of modeling,which has become a hot research topic.In view of the above problems,this paper uses the machine learning method to study the identification of instant communication traffic.By analyzing the research status in the field of traffic identification in recent years,a real-time communication traffic identification scheme based on machine learning is proposed.This scheme identifies different instant communication protocols by extracting the heartbeat behavior in instant communication traffic as the classification feature.Through the analysis of the characteristics of real-time chat software communication traffic,it is found that the heartbeat behaviors of different instant messaging software long connections have great differences.A heartbeat process has obvious characteristics in terms of packet interval time and packet size,while other behaviors of the package is highly similar.By removing other data packets that are not discriminative,the accuracy of the recognition can be significantly improved.This paper proposes two methods of heartbeat packet extraction: a heartbeat packet extraction method based on statistics and a heartbeat packet extraction method based on association rule mining.The heartbeat packet extraction method based on statistics extracts the heartbeat process by calculating the similarity of different size clusters.The heartbeat packet extraction method based on association rule mining extracts the heartbeat process by mining association rules between packets.This paper firstly collected data of four commonly used instant messaging softwares such as Ali IM,We Chat,Ding Talk,QQ,etc.By analyzing their heartbeat behavior,and then extracted their heartbeat packet using the scheme designed in this paper.In the feature extraction part,the behavior characteristics of the heartbeat process are extracted,and then the RFE is used to eliminate the useless features to improve the calculation speed and generalization ability of the model.Finally,a variety of machine learning classification algorithms are selected for modeling and testing.The experimental results show that the scheme can achieve 99% recognition accuracy by extracting the heartbeat behavior,which makes up for the shortcomings of the existing work.
Keywords/Search Tags:Instant Messaging, Traffic Identification, Heartbeat Extraction, Heartbeat Behavior, Machine Learning
PDF Full Text Request
Related items