| With the popularization of wireless network and the 4th Generation mobile communication technology,mobile intelligent terminal is going to be an indispensable instrument of daily life.Moreover,smartphones are used more regularly than personal computers.As the most universal operating system on mobile terminals,Android platform is connected closely to user's private information and financial security.As a result,there are more and more malware aiming at Android platform.Similar to organism,gene also exists in malware,which is the symbol of inheritance and variation of malware.Code duplication and similarity of programming habits always reveal the homology of malware,which express the inheritance and variability of malware families.However,there is a lack of research on the homology of malware family,let alone the deficiency of uniform and systematized method to analysis malware families.To resolve those problems,Android malware gene is proposed in this paper.Several kinds of malware fragments are extracted to show the malicious features of malware family,which are helpful to analyze the homology of malware.The main research contents and innovation points of this paper are as follow:1.Definition and extraction of Android malware gene are proposed,to build the basic Android malware genebank.Extract gene to reflect the homology of Android malware via serval methods.Among those,the code fragment gene is analyzed based on use-def chains,which are a kind of minimum semantic-based data flow.What's more,resource fragment gene and configuration file gene are extracted from fragments which could show the malware family characterics.And then,all genes are formalized.Finaly,the basic Android malware genebank are built via screening.And different methods of further screening will be utilized to different applying.2.Android malware clustering model and homology analysis frame are built based on the basic genebank.Currently,there are no accurate family labels of Android malware.Android malware are clustered through comparing different machine learning methods.After compared with the anti-virus labels,the cluster results are interpreted with the similarity and homology among current families.And the malware homology analysis frame are built based on cluster results to research the homology of different Android malware.3.Detection frame and classification frame of Android malware are built.Based on Android malware detecting genebank,Android malware detection system are built via support vector machine(SVM),with a recall rate of 98.37%which means that the system has a high detection accuracy.And the Android maware classification system are built by multi-classification SVM,with various labels of malware.Among those labels,the one based-on clustering result has the best classify effect.Through this model,Android malware classification are realized,and the effectiveness and consistency of gene-based Android malware research are also verified. |