| As widely used Android OS,Android malware rapidly increases and severely threatens the information security of Android platforms.Malware can steal sensitive information of users,even maliciously withhold fees,steal funds,resulting in serious economic losses.Therefore,it is urgent to study the technology of Android malware detection.The malware family classification can systematically manages the Android malware samples.According to the information of existing Android malware family,the new malware can be preliminarily identified its family,malicious behaviour,the purpose of the attack and so on.The malware family classification is very hot in malware detection area.This thesis mainly studies the malware classification based on dynamic malware analysis.At present,the research about Android malware family classification based on dynamic analysis still has some problems,such as low success rate of dynamic malware analysis,large classification granularity and low classification accuracy.To solve these problems,the main works of this thesis are:1.For the problem of low success rate of the dynamic malware analysis,we imply Androguard,MonkeyRunner and other advanced reverse analysis techniques to modify some parts of existing dynamic analysis,such as the way of extracting package name and triggering malicious behavior,to optimize the dynamic analysis methodology.We design the pseudo-event trigger by the malicious events.We use the Drebin dataset to test the improved solution,and the experiment results show that the improved solution can greatly improve the success rate of dynamic malware analysis.2.Based on the improved dynamic analysis above,we decrease the classification granularity,and combine SVM and DBSCAN to design the classification algorithm for small family.The experiment is divided into three classification sizes(L,M,S)according to the number of malware samples.The final malware family classification accuracy is 81%(L),75%(M)and 59%(S).3.For the problem of low accuracy of large family classification,by combining two dynamic features,one is the resource consumption such as CPU and memory,and the other is the state sequence combined with Markov chain.The results show that these two kinds of dynamic features can better reflect malicious behavior and greatly improve the accuracy of existing large family classification. |