| Since the inception of the Internet,malicious programs have posed a constant threat to people in the online world,stealing personal information and compromising computer systems.At present,research and detection of malicious programs are carried out all over the world.In view of the substantial increase in the types and quantities of malicious programs,how to process and analyze malicious codes quickly and accurately has become a crucial issue.This thesis starts from this problem,and proposes a new solution for the shortcomings of the existing solutions in both static analysis and dynamic analysis.In terms of static analysis,inspired by the visual similarity of images generated by the same family after malicious programs are imaged,it is believed that the problem of malicious program identification can be transformed into an image recognition problem.Before,a large number of researchers used machine learning technology to identify and classify malicious programs,but the existing solutions are not ideal for massive samples.Deep learning has great advantages in image recognition in recent years.In view of the low accuracy and long time required for existing deep learning models for data imbalance,this thesis designs a preprocessing and sample enhancement technology.The Convolutional Neural Networks(CNN)model is used for feature extraction and identification of malicious program images.In terms of classifiers,considering that the number of samples in the current malware image dataset is not large enough,this thesis further proposes that the softmax activation function in the traditional CNN model can be replaced by a double-regular Support Vector Machine(L2-SVM).L2-SVM classifies malicious programs according to the features extracted by the CNN model.Subsequent experiments demonstrate that the proposed CNN-SVM hybrid model is not only more accurate than the existing CNN models,but also takes less time.In terms of dynamic analysis,the most widely used dynamic feature is the Application Programming Interface(API).However,due to the heterogeneity of API parameters,the current API-based malware detectors either highly rely on the statistical characteristics of the API without considering the parameter information of the API,which is not enough to fully understand the malicious program.Accuracy is low.Or the processing of parameter information requires a lot of relevant expertise,so complex operations are required to process the parameters.To overcome the above deficiencies,this thesis proposes a lightweight dynamic feature extraction method that considers API parameters,and then inputs the extracted features into a machine learning algorithm to realize malicious program identification and classification.The experimental results show that the feature extraction method proposed in this thesis avoids the high processing costs while considering API parameters to obtain better accuracy. |