| With the rapid development and widespread popularity of mobile smart terminal technology,Android applications have become a popular software form,and Android applications with various functions have been developed continuously.Although these applications can meet people’s needs,the security issues of applications have become increasingly prominent and have triggered some typical security events.Due to the open source of Android system itself and the weak application market regulation,users may download some malicious applications,resulting in serious consequences such as personal information leaks or property loss on electronic accounts.Therefore,it is important to detect every kinds of Android malwares accurately to reduce their harm.Most of the existing malware detection methods extract behavior patterns mainly through the function call graph of the applications to be detected,or extract text features from the code text without the code pattern.In view of this,this thesis focuses on the malicious code pattern of code text and proposes a malware detection method based on Term Frequency-Inverse Document Frequency(TF-IDF).This method uses TF-IDF technology to construct word vectors called by sensitive functions to represent malicious patterns in code text.However,detection methods based on a single feature have limitations such as missed or false positives.To overcome these shortcomings,based on the theory of selective integrated learning,this thesis constructs an Android malware detection method AMD-SEL(Android Malware Detection based on Selective Ensemble Learning)with multiple feature analysis.This method takes into account five distinct features: the code program text,the TF-IDF value of the sensitive function call in the code text,the centrality of the sensitive function,the intimacy of the sensitive function,and the tuple of the sensitive function.Five kinds of features of the applications to be detected are extracted by static analysis,and then basic subclassifiers are trained by five typical machine learning algorithms.Then,based on the idea of selective ensemble learning,a heuristic search algorithm is used to assign weights to each subclassifier.In this way,all the subclassifiers and their corresponding weights can build an ensemble Android malware detection model.In order to verify the effectiveness of the AMD-SEL,experimental analysis was conducted on four evaluation metrics: precision,recall,1,and accuracy on a publicly available and real Android applications dataset.The experimental results show that the AMD-SEL proposed in this thesis outperforms the detection method based on a single feature in all four metrics.Therefore,this study can provide some reference for improving the security of Android application security. |