| In the past decade,the Android operating system has become the primary target of malware attacks due to its popular and open market principle,which seriously endangers the security of users’ information and property.Accurate detection of Android malware is imperative.Currently,the existing models based on a single classifier are limited by the feature set and the underlying algorithm of the model,so the further improvement of the detection accuracy rate will encounter bottlenecks.Meanwhile,the detection performance of the model will fluctuate due to the influence of the quality of the data set on the one hand,and on the other hand,the continuous evolution of malware will lead to a sharp decline in detection efficiency over time.Therefore,further improving the detection efficiency of Android malware,improving the anti-aging and robustness of the model is of great significance for accurately identifying App categories and protecting the security of users’ information.In view of the low detection efficiency of a single classifier and the poor robustness and anti-aging performance of the current malware detection model,this paper improves the integrated learning combination strategy based on the iterative optimization of feature subset,feature weight and classifier parameters,classifier combination mode and classifier weight coefficient,and effectively breaks the bottleneck of detection efficiency of a single classifier.It improves the detection effect of the classifier and improves the anti-aging and robustness of the model.The main work in the paper is as follows:(1)A scheme based on genetic algorithm is proposed to improve the model detection efficiency of a single classifier.In order to further improve the detection effect of a single classifier and make it an excellent ensemble learning based classifier in the following work,in Chapter 3,we first filter eight types of features that can comprehensively represent the behavior of application programs through the filter based method,and then find the optimal feature subset through the proposed genetic algorithm of feature subset,feature weight and classifier parameter genetic algorithm.Global search for weight coefficient and classifier parameter chromosome based on optimal feature subset.Experiments show that this scheme can improve the accuracy of a single classifier by about 4%,and other relevant indicators are also improved.(2)A two-strategy malware detection scheme based on ensemble learning is proposed.In order to improve the robustness and aging resistance of the model and reduce the impact of data set noise and malware evolution on detection performance,Chapter 4 proposes an improved integrated learning architecture based on two combination strategies,Bagging and Stacking.Three kinds of data sets are constructed,including basic data sets,intergenerational data sets and manually shuffling labels.Two kinds of genetic algorithms,the combination method of base classifier and the weight coefficient of classifier,are proposed.Through the exhaustive solution space,the integrated learning model with reasonable collocation of base classifier combination and weight is designed.Experiments show that the detection efficiency of this scheme declines more slowly than other models on cross-generation data sets.In other words,this scheme has stronger anti-aging effect.However,in the data set with manually scrambled labels,the scheme can maintain a stable detection accuracy without being affected by noise disturbance in the case of poor data set quality.In terms of detection accuracy,compared with the optimized single classifier,each index of this scheme has obvious advantages.(3)A malware detection model based on genetic algorithm and ensemble learning is designed.On the basis of Chapter 3 and Chapter 4,this model starts from the requirement analysis,takes feasibility as the starting point,and combines the functions of each module,which can maintain a high accuracy while having strong robustness and anti-aging.Through collecting the samples with classification errors,the model can be automatically updated to achieve the expected goal.In conclusion,the malware detection model proposed in this paper is based on genetic algorithm and ensemble learning with different combination strategies,which effectively improves the detection efficiency,aging resistance and robustness of the model.At the same time,the shortcomings of this paper are analyzed,which provides a reference for the subsequent improvement of the model. |