Research And Application Of Malware Identification Method

Posted on:2018-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:W Chen

Full Text:PDF

GTID:2348330512983342

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years,China’s Internet industry has developed rapidly.Many things that only exist in the real world in the past constantly appear on the Internet now,such as e-commerce,social networking,Internet finance and so on.People’s lives have been increasingly inseparable from the Internet,which has become a part of people’s lives.When Internet continuously integrate into people’s lives at the same time criminals also infiltrated a lot.They use malware to steal users’ account passwords,spy on their privacy,send spam and so on to obtain benefits or damage the normal Internet environment,which seriously violates the interests of ordinary users.Accurate identification of malware is critical to protecting the interests of ordinary users.But most anti-virus softwares now are still using signature detection,heuristic search and other malicious software detection technology.The method of malware identification based on machine learning has not been widely studied and used.Therefore,this paper mainly studies the malware identification technology based on machine learning.This article uses the windows executable file as an example to study the malware identification.The main work in this paper is as follows:1.Using the static analysis technology and the dynamic analysis technology to extract the features of the software sample,the paper uses the extracted information to construct six types of features including the PE document header features,the readable string features,the key behavior features,the API call frequency features,the API call time series,network features to comprehensive describe the software samples.2.This paper proposes a multi-featuregroups model combination algorithm based on XGBoost for malware identification.The algorithm train several classification models,which is equivalent to training a number of malware experts in different domain.Each classification model uses one or more feature groups to train the model.And then synthesize their detection results to arrive at the final identification result.The experimental results show that the accuracy rate is 97.6%,the recall rate is 97.1%,the precision rate is 96.7%,which is higher than the classical classification algorithm.The application of the combination algorithm to identify malware is one of the main contributions and innovations of this paper.3.Constructs deep neural networks based on the LSTM to extract high-level abstraction feature of the time series of the software sample API call.At the same time,this paper uses the main idea of deep residual networks and use six types of malware features including the high-level abstract features to train deep neural network with shortcut connect for malware identification.The experiment found that the accuracy rate is 98.1%,the recall rate is 97.9%,and the precision rate 97.1%,it can identify malware more accurately.

Keywords/Search Tags:

malware identification, XGBoost, model combination, LSTM, deep residual networks

PDF Full Text Request

Related items

1	Research On Sales Of Dishes Forecast Based On Deep Learning And Combination Model
2	Application Of LSTM Hybrid Model In Shanghai Composite Index Forecast
3	China-ASEAN Academic Field Sentiment Analysis Model Based On CNN-LSTM
4	Research On Question Classification Combination Model Based On Deep Learning
5	Research On APT Malware Traffic Detection Method Based On Association Rules And Timing Characteristics
6	Research On Malware Detection Based On Machine-learning
7	Prediction Of Crude Oil Futures Price Based On EEMD-ARIMA-LSTM Combination Model
8	Research On The Application Of Enterprise Illegal Fund-raising Identification Based On XGBoost And LR Integrated Model
9	Research On Deep Residual Networks Of Residual Networks For Image Classification
10	Research On Deep Learning Based Malware Feature Analysis And Detection Method