| The rapid development of the Internet has not only brought convenience to our life,but also given rise to a variety of malicious codes.The emergence of botnet,Trojan,blackmail virus,spyware and other malicious codes has posed a great threat to network security.Currently,developers of malicious codes often use executable packing and other code obfuscation techniques to generate a large number of variants of malicious software.These have made it difficult to identify malicious codes.Studies have found that most malicious codes attempt to achieve their malicious objectives through network activities.This paper proposes a novel method of detecting malicious codes.The key idea is to distinguish the network traffic generated by network behaviors of malicious codes from normal network access traffic.For this purpose,this paper has made the following efforts:First,it analyzes and compares the attack principles and network behaviors of botnet,blackmail virus,Trojan and several other malicious codes and then generalizes the typical features of network behaviors shown after the host is affected by malicious codes.By taking the most harmful botnet with multiple malicious code features as an example,this paper analyzes the life cycle and communication mechanism of malicious codes and summarizes their network stream patterns of C&C communication.Malicious codes are in essence computer programs that execute orders according to preset procedures,so the network traffic generated by their network behaviors would show specific patterns.Based on quintet packet aggregation,this paper aggregates network data packets into network flows.The existing network traffic identification theories and feature extraction strategies are used to analyze the stream features of malicious codes in their network communication.A feature set containing the network behaviors of malicious codes is constructed by describing the network behavior patterns of malicious codes according to data packet size,arrival time interval,data volume and other statistical features.Then,a supervised machine learning strategy is used to construct the basic framework for identifying network traffic with the random forest classification algorithm to classify network traffic.Consequently,a model for identifying network traffic of malicious codes based on flow features is obtained.Finally,simulation experiments are conducted to verify the feasibility of the proposed model for identifying network traffic of malicious codes.With the set of malicious code stream features,the random forest classification algorithm is used to train the classification model on the network traffic dataset of malicious codes.The results show that the proposed identification model is able to effectively identify the traffic generated by malicious codes and detect malicious behaviors in network activities.Therefore,the proposed idea of detecting malicious codes through network behaviors of malicious codes is feasible. |