| With the rapid development of the mobile market,the Android system has gradually become the most widely used mobile operating system.At the same time,it has also become the main target of malicious attacker.According to statistics,90% of Android malware is deformed from the same malware.Therefore,classifying the constantly added malware variants into families can not only attribute the sample to the similarity of behaviors between applications,shorten the detection time of malware;at the same time,by exploring the evolutionary laws between the same family and families,it helps effectively detect and defend against new variants of malware.Existing Android malware family detection and classification methods are mainly based on single features such as permissions,API,and bytecode.In order to extract the representative features of Android malware in a deeper level and improve the generalization ability of the model,the specific work is as follows:(1)Propose a classification method of Android malware family based on random forest.By extracting the permissions and API characteristics,using the weight calculation to filter out the key permissions and define them as sensitive permissions,at the same time,the key APIs calculated by the calculation,the APIs corresponding to the dangerous permissions in the Android 5.1.1 version and the officially declared highly sensitive The three parts of the API are defined as sensitive APIs.After combining the two features,the random forest algorithm is used to conduct family classification experiments.Compared with current research work,this article can use sensitive permissions and API features for analysis,while reducing the analysis time while achieving a better classification effect.The experimental result F-Measure reached 98.1%.(2)Propose a classification method of Android malware family based on the combination of CNN and multi-head attention mechanism.In order to improve the generalization ability of the model,on the basis of extracting sensitive permissions and sensitive APIs,Intent and Activity features reflecting the behavior of malware are added.Due to the increase of feature types,it is difficult for shallow machine learning algorithms to mine the hidden relationships between features from the feature space,so a convolutional neural network and a multi-head attention mechanism algorithm are used to conduct family classification experiments.The difference between the multi-head attention mechanism and the ordinary attention mechanism is that the former can copy multiple multi-heads and give different weights.The multi-head attention mechanism is combined with a convolutional neural network that can extract local features and effectively reduce dimensions for classification experiments.Experimental results show that the F-Measure of this method reaches 99.25%,which is significantly better than other deep learning algorithms.(3)Family homology analysis.Based on the classification of malware families,this article makes an in-depth analysis of the permissions,API,Intent and Activity characteristics of each family to trace the homology of the family.Through homology analysis experiments on representative features,the F-Measure reached 86.9%,indicating that these features can distinguish different families well.In addition,the hierarchical clustering method was used to perform clustering experiments on 20 families,which further verified the similarity between families,and could provide a certain basis for the evolution process of families. |