The booming development of the Android market has made mobile devices more popular and convenient.However,facing the complex Android environment,how to efficiently and accurately identify malicious software has become one of the research focuses.Various types of malicious camouflage software lurk in web pages,links,and major application stores,and protecting people’s privacy and property security has become an important issue that urgently needs to be solved in the development of the Android platform.In addition,with the emergence of camouflage and obfuscation technologies,malware can easily evade conventional detection methods,which poses a serious challenge to traditional Android malware detection technology.Therefore,this thesis focuses on the network traffic,static information,and dynamic call information of Android software as feature objects,with the goal of accurately classifying various Android malware.A series of studies were conducted.The existing Android malware classification methods mostly adopt static analysis and dynamic tracking,which is easy to evade for new types of malware through disguise technology.At the same time,network traffic detection methods often use the same traffic detection technology as PC devices.Although these methods improve detection accuracy,they still face issues such as being unable to adapt to lightweight Android devices and imbalanced classification results.Therefore,a single feature detection dimension and a lack of targeted network traffic detection methods cannot effectively complete the multi classification task of Android malware.Based on this,this thesis achieves a more accurate and stable classification method for malicious software by integrating the multi class features of Android software and improving network traffic detection technology.At the same time,in order to achieve visual operation of the software classification process,this thesis designs and implements an Android malware automated classification prototype system.The main research content and contributions of this thesis are summarized as follows:1.The detection methods based on a single feature and traditional traffic detection techniques often rely on large datasets and cannot effectively deal with various Android malicious camouflage software.In order to achieve the goal of high-precision classification of malware under lightweight data volume,this thesis proposes an Android-based Malware Classification Method using Feature-fusion and NLP(AMCM-FN)to integrate the multi-dimensional features of Android software and further improve the accuracy,recall,F1measureand other indicators of malware multi classification.This study proposes a multi feature fusion hierarchical model and a multi feature fusion algorithm based on the HTTP traffic,permission information,and API information of the Android software application layer.At the same time,AMCM-FN model introduces improved natural language processing technology to generate fusion vectors with more abundant Semantic information from multidimensional features,thus solving the problems of traditional bag-of-words model and one-hot encoding that are too sparse and ignore word set connection.In order to verify the effectiveness and progressiveness of the method,this thesis conducts experimental comparison and analysis of multiple indicators based on existing malware classification methods.The results show that AMCM-FN has obvious classification advantages and higher classification performance under common Android software types.2.The existing Android malware classification methods usually adopt a fixed feature fusion approach,which leads to a series of problems such as imbalanced classification results and resource waste.In order to obtain raw traffic information on a larger scale and dynamically select feature fusion schemes,this thesis proposes an Android malware classification method based on gray-scale image and feature-selection tree(DCM-GIFT).The DCM-GIFT model focuses on the transport layer detection of Android software,which preserves the spatiotemporal sequence characteristics of the original traffic by constructing TCP and UDP traffic gray-scale image of the transport layer.At the same time,the DCM-GIFT model uses 6 types of Android software static information(activities,services,intents,permissions,receivers,providers)and 2 types of dynamic information(API,memory)as auxiliary features to construct a feature-selection tree.The feature-selection tree construction algorithm proposed in this thesis can select the best feature fusion scheme based on different software types,thereby avoiding invalid or inefficient feature combination methods and solving the problem of imbalanced classification results caused by fixed feature fusion methods.Finally,this thesis evaluates the performance of the DCM-GIFT model on the publicly available dataset,and the results effectively demonstrate the effectiveness of the proposed method.3.Given that existing detection systems cannot achieve multi-level feature detection and automated classification for Android software,this thesis designs and implements an Android malware automatic classification system(AMACS).The system adopts a B/S architecture and Flask framework,achieving modular Android feature extraction,data preprocessing,classification model training,and statistical analysis functions.The system is characterized by interactive operation,high automation and easy expansion.Through testing and operation,the prototype system of this thesis can accurately classify common Android malware. |