| Recently,due to the development and improvement of mobile Internet and smart phone technology,mobile smart terminal equipment has become a necessity in people’s lives.Among various mobile operating systems,Android is currently widely used mobile operating system,occupying 70.40% of the market share in 2021.However,due to the open source nature of the Android system,attackers can easily implement malicious code into benign applications and execute unauthorized dangerous behaviors,such as malicious deduction,unauthorized access,theft of private information and remote control of smart devices,which brought serious threats to the end users.Besides,because of high market share of the Android system,more and more attackers are targeting the Android platform,it leads to Android platform facing more serious security problems.Therefore,how to efficiently and accurately detect Android malware and effectively distinguish its malicious families has become one of the hot research topics in software security.This thesis deeply studied the existing Android malware detection methods,and found that a single type of feature cannot fully describe the behaviors of applications.Thus,the detection methods using a single type of feature have a relatively high false positive rate.Meanwhile,the number of features in the feature set constructed by multiple types of features is huge and there are a lot of redundant features,which will not only cause huge computational overhead for training,but also seriously affect the detection result.Moreover,Android malware belonging to the same family usually have similar malicious behaviors,that is,malware of the same family has malicious code reuse.Reasearching the families of malware can help to understand their malicious behaviors more clearly.The main contents of this thesis are as follows.(1)Three levels of feature selection methods are designed to process the original feature set with a large number of features,and select highly distinguishable features that can effectively distinguish benign and malware,so as to remove noise and reduce the number of features and the training cost of the model.Firstly,the fast mean-based filtering method is used to process the original feature set with a large number of features.Two weight calculation methods are designed to measure the importance of features,namely the similarity weight and the word frequency inverse text frequency difference.Then,the correlation analysis based on the Pearson correlation coefficient is utilized to mine the features with higher correlation and eliminate them to further reduce the number of features.Finally,the wrapper method called recursive feature elimination with cross-validation is used to obtain the optimal feature subset.(2)A malware detection method based on highly distinguishable static features is proposed,that solves the problem results from using a single feature.In addition,the Dense Net model is employed in the final classification module for training and predicting unknown samples.The densely connected Dense Block can not only make the transfer of features more effective,but also solve the problem of vanishing gradient and the problem of increasing the number of parameters result from more layers of the network model.Therefore,the Dense Net model can deepen the number of model layers to make full use of the features to improve the detection effect with fewer parameters.(3)On the basis of converting Android malware application executable Classes.dex into RGB images by utilizing binary sequence visualization technology,we combined with Dense Net model which introduces attention mechanism to achieve accurate classification of malware families.(4)Comprehensive experiments are designed to evaluate three levels of feature selection methods proposed in this thesis,the Android malware detection method based on highly distinguishable static features,and the malware family classification method based on binary sequence visualization technology from virous aspects.Experiment results show that these methods are effective.In addition,experiment results show that the malware detection method and family classification method proposed in this thesis have better accuracy than the existing malware detection methods and family classification methods. |