Font Size: a A A

Research And Application Of Malware Identification Method

Posted on:2018-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:2348330512983342Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,China’s Internet industry has developed rapidly.Many things that only exist in the real world in the past constantly appear on the Internet now,such as e-commerce,social networking,Internet finance and so on.People’s lives have been increasingly inseparable from the Internet,which has become a part of people’s lives.When Internet continuously integrate into people’s lives at the same time criminals also infiltrated a lot.They use malware to steal users’ account passwords,spy on their privacy,send spam and so on to obtain benefits or damage the normal Internet environment,which seriously violates the interests of ordinary users.Accurate identification of malware is critical to protecting the interests of ordinary users.But most anti-virus softwares now are still using signature detection,heuristic search and other malicious software detection technology.The method of malware identification based on machine learning has not been widely studied and used.Therefore,this paper mainly studies the malware identification technology based on machine learning.This article uses the windows executable file as an example to study the malware identification.The main work in this paper is as follows:1.Using the static analysis technology and the dynamic analysis technology to extract the features of the software sample,the paper uses the extracted information to construct six types of features including the PE document header features,the readable string features,the key behavior features,the API call frequency features,the API call time series,network features to comprehensive describe the software samples.2.This paper proposes a multi-featuregroups model combination algorithm based on XGBoost for malware identification.The algorithm train several classification models,which is equivalent to training a number of malware experts in different domain.Each classification model uses one or more feature groups to train the model.And then synthesize their detection results to arrive at the final identification result.The experimental results show that the accuracy rate is 97.6%,the recall rate is 97.1%,the precision rate is 96.7%,which is higher than the classical classification algorithm.The application of the combination algorithm to identify malware is one of the main contributions and innovations of this paper.3.Constructs deep neural networks based on the LSTM to extract high-level abstraction feature of the time series of the software sample API call.At the same time,this paper uses the main idea of deep residual networks and use six types of malware features including the high-level abstract features to train deep neural network with shortcut connect for malware identification.The experiment found that the accuracy rate is 98.1%,the recall rate is 97.9%,and the precision rate 97.1%,it can identify malware more accurately.
Keywords/Search Tags:malware identification, XGBoost, model combination, LSTM, deep residual networks
PDF Full Text Request
Related items