Font Size: a A A

Android Malware Detection Based On Multi-feature Information Gain

Posted on:2019-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:M T ZhouFull Text:PDF
GTID:2428330572451516Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the Android operating system has become one of the most popular mobile platforms up till now,various Android applications have been closely linked to people's daily life.For consistently insisting the principles of openness and free services,Android system attracts a large number of developers and users,which,however,also speeds up the growth of Android malware at the same time,leading to a big security issue in Android applications.Therefore,in order to ensure that it is much safer and more comfortable for users while using Android applications,the study of malware detection on the Android platform has been a hot topic in recent years.At present,the methods of Android malware detection mainly include dynamic detection and static detection.In dynamic detection method,applications need to run in a closed simulation environment,which is time-consuming,difficult to realize,and occupies many system resources.In contrast,applications do not need to run in static detection method.Besides,it has low cost,takes less time and consumes a little system resources.However,the detection rate for the new emerging malware is not high enough.Hence,in this paper,we propose a scheme to detect Android malware based on multi-feature and the optimization of information gain.This scheme keeps low consumption and high efficiency of the static analysis,and improves the detection rate of unknown malware by using the advantage of machine learning.The main contents of this paper are as follows.First of all,the static analysis based on the different functions of applications is carried out.By reversing the apk files through decompilation technology,we can obtain an application manifest file which is able to be used to read and analyze.And then this manifest file is parsed and four categories of attributes are extracted,including hardware components,system permissions,application components and intent filters.Next,a feature set is formed by mapping the attributes extracted from a same functional category and the corresponding markers information of each sample to a same vector space.On the basis of the feature set,we add the feature selection process.The information gain for each feature in the feature set is calculated and then sorted in a descending order.Moreover,the optimal feature subset is constituted of the features which are sorted in the front and valuable to the classification.Finally,the KNN classifier is used to train the elaborate data sets to obtain the learning models of each functional category,which are used for automatically analyzing and detecting Android applications.In the experiment section of this paper,we select the samples of 15 kinds of different functional category,including 1191 benign samples and 1191 malicious samples,and the quantity distribution of benign samples and malicious samples under each functional category is balanced.Using the trained model to test,firstly,we compare the classification effects of the test set of each functional category before and after feature selection,which proves the effectiveness of our scheme and the influence of the optimization of information gain for classification.The feature selection process not only reduces the difficulty of training model in learning process,but also improves the accuracy of the models.Secondly,we compare the classification effects of our models and the model which is constructed not according to function of applications.The result proves our detection scheme based on the functional categories of applications has a good ability in classification.
Keywords/Search Tags:Android Malware, Static Analysis, Decompilation, Information Gain, K Nearest Neighbor
PDF Full Text Request
Related items