| Android operating system is the most popular operating system in modern smartphones,and it has become the main target of a large number of malicious software attacks.The increasingly serious Android security problems have brought threats such as property loss and privacy disclosure to users,so the research on stable and effective detection methods of Android malware is of great significance for protecting user information and property security.At present,machine learning technology is widely used in Android malware detection research.Because API features contain rich semantic information,a large number of static analysis studies use Android API as the feature of machine learning model.However,with the update of Android version and software version,the use of the new API has brought dimension disasters and data distribution changes,resulting in rapid aging of the model.This thesis focuses on the problem of model aging brought by the new API,focuses on the detection methods that can effectively delay model aging,and proposes effective methods to delay model aging.The main work is as follows:(1)Aiming at the problems of dimension disaster and rapid change of data distribution caused by Android version update,This thesis proposes a feature enhancement method based on API semantics.First,this thesis extract the call sequence containing API semantic information from the static analysis code,and embed the API vector through the semantic analysis technology and then cluster the APIs with similar semantic information to obtain the initial clustering feature set using the clustering algorithm.Finally use the cluster expansion algorithm to classify the new API into the nearest cluster,reduce the feature dimension and maintain the stability of the feature space.This thesis carries out experimental verification on the collected 2016-2020 data set.The experimental results show that compared with other feature enhancement schemes,this method can effectively maintain good detection performance while reducing time overhead.(2)In order to solve the problem that the aging model needs to be updated in time,This thesis proposes an Android malware detection method based on incremental learning.This thesis combine several different sub-classification models for classification,and use the characteristics of different models with different aging rates to design aging scores,then judge the aging degree of the model through aging scores,and use the classification results of the integrated model as a pseudo-label to incrementally update the aging model.Experimental results show that compared to traditional machine learning schemes,the accuracy of the incremental integration model can be improved by an average of 6.1%,while saving about 30% of manual labeling costs.(3)In order to combine the advantages of feature enhancement methods and incremental integration models,this thesis designs and implements an Android malware detection scheme based on API semantics and incremental learning.The experiment shows that compared with other schemes to delay model aging,the scheme proposed in This thesis can effectively handle the new APIs.After one year,the F-measure reaches 94.5%,and after five years,it can maintain the accuracy of 90.9%,achieving the effect of delaying model aging. |