Font Size: a A A

Research On Android Malware And Its Family Classification Based On Machine Learning

Posted on:2023-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q SunFull Text:PDF
GTID:2558306905469134Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to the widespread popularity of smartphone and the mushroom growth of mobile network,the Android operating system and various applications can be said to have been greatly integrated into our daily life.But in the meantime,the emergence of many malicious applications also affects the normal use of users.Therefore,it is of great practical significance to efficiently and accurately detect malware and identify the malware family to which it belongs.And in this thesis,some problems in the field of Android malware detection,the following research work has been done:First of all,researches on Android malware detection based on machine learning usually have unbalanced samples of normal applications and malicious applications,and the imbalance of data sets will cause the learning results of algorithms to be more biased towards most categories with more samples,which is not conducive to the detection of malware.In view of the above problem,this thesis proposes an Android malware detection method(KMOAD)using K-means SMOTE balanced dataset.The method firstly performs static analysis of Android applications,extracts feature information including API,components,permissions and so on,and uses feature ranking to preprocess the feature sets to remove the inefficient features,then use the K-means SMOTE method to oversample the minority sample to achieve the goal of balancing the number of normal applications and malicious application samples.Finally,three machine learning algorithms of KNN,SVM and ID3 are used to train the detection model.The experimental results show that the classification performance of the detection model in this thesis is better than that of the unbalanced dataset,and it is also better than the detection model that uses the basic SMOTE method to balance the dataset.Secondly,with regard to the identification of Android malware families,most of the current researches only focus on some large families with a large sample size,and usually ignore the small families with a relatively small sample size.In response to this situation,this thesis proposes a method based on random forest classifiers,which uses clustering algorithms to assist decision-making and can improve the recognition performance of malware families of all sizes.Static analysis is also used for feature extraction and processing,and the performance of the three clustering algorithms of DBSCAN,OPTICS and HDBSCAN for family identification problems,especially for small family identification is studied.Experiments show that HDBSCAN has a relatively good performance in the malware family identification problem.Then on the basis of random forest classifier,combined with clustering algorithm,the clustering results of the samples are used to assist in the identification of malware families.The experimental results show that the proposed model still has good detection performance when considering all malware families.Among them,the combination of HDBSCAN and RF performs best.
Keywords/Search Tags:Android malware, Machine learning, oversampling, clustering, random forest
PDF Full Text Request
Related items