Font Size: a A A

Research On Malicious Code Detection Method Based On Time Series Feature

Posted on:2022-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q GaoFull Text:PDF
GTID:2518306326984699Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Malicious code through web links,system vulnerabilities,email and other ways to break into computer systems,causing great losses to users,especially for today's most popular desktop operating system Windows.Therefore,the study of dynamic malicious code detection is of great significance to build a safe and green network environment.In recent years,researchers have used data mining method to detect malicious code and achieved high recognition rate.However,traditional machine learning methods require security personnel to manually design features to construct detection models,which requires high manual experience.Although the deep learning method can automatically extract features,it is difficult to explain the decision basis of the model due to its black box nature.Data mining research meaning is to help people find the key information in the data,we call it interpretability.The methods in the field of time series classification can be used for reference in the aspect of automatic feature extraction and model interpretation.Focusing on automatic feature extraction and interpretability,this paper presents the dynamic API call sequence of malicious code as time series,and studies the method of malicious code detection based on time series feature.The main work of this paper includes:(1)The method of malicious code detection based on time series feature is studied.The experimental analysis shows that there is a big difference between the amount of information contained in the sequence segment of malicious API call and the sequence segment of normal call.In this paper,by calculating the local information entropy of dynamic API sequence,the API call sequence is converted into entropy time series.Based on the level of information entropy,the Shapelet transform algorithm in time series classification is used to automatically extract time series features and train the classifier to realize malicious code detection.The experimental results show that the proposed method is more accurate than the traditional methods,and the results can be interpreted.(2)Aiming at the shortcoming of low efficiency of Shapelet algorithm for time series classification,the acceleration algorithm based on Shapelet is studied.Based on the idea of random projection,an improved Shapelet transformation algorithm,Hash Shapelet transformation algorithm,is proposed to improve the time efficiency of the existing algorithm.Different from the feature extraction at the level of information entropy in the second work,the improved Shapelet transformation algorithm is used to automatically extract the timing feature from the original API call sequence and realize malicious code detection,which improves the accuracy and time efficiency of malicious code detection.(3)The malicious code detection system based on time series features is designed and implemented,and the overall design and function modules of the system are introduced in detail.
Keywords/Search Tags:Malicious code detection, time series classification, Shapelet, information entropy, interpretability
PDF Full Text Request
Related items