| With the development of the mobile Internet,smart phones is growing rapidly,Android as a current market share of Smartphone operating systems,its corresponding application is staggering.Due to its open source nature,leading to easier to develop an Android application,and developers have more free space,and some app store audits application not strict,resulting in a large number of malicious applications.Meanwhile,the next few years,the application is expected to be explosive growth in the number,methods have become more sophisticated malicious code.Faced with these threats,we designed a support vector machine based on posteriori probability discriminant model of malicious code,as an aid to analysis and antivirus engine.SVM is a method for automatically discovering data based on statistical rules,it can analyze huge amounts of sample statistics to build models,which makes it difficult for an attacker to master free to kill the law.The models already in Antiy labs formally launched.The paper technology based on support vector,malicious code for the purpose of detection,completed a discriminant model of malicious code to the family,and implements to be used on the input line,analysts provide assistance during sample analysis.This includes Android basic introduction,data pre-processing,rules of conduct and the classification model of training processes.First introduce the basic structure of the Android operating system,and analysis of the application form,including the META-INF directory,the res directory,the Android Manifest.xml file,and the basic structure of the classes.dex file,and feature extraction method in detail.Secondly,the information gain in comparison with experiment,Chi-square statistic feature selection algorithm in three ways,document frequency and found information gain effect.The model selection of the characteristics of information gain before the value of 5000 items feature a dictionary,and then through the TF-IDF calculate the weight of 5,000 characters,so as to constitute the final character dictionary,through the field,DEX any APK files can be mapped to numeric vector.85 active family samples of SVM based on posteriori probability model 85 two classification models and thresholds are adjusted according to the scenario.In addition,in order to improve the reliability of the model,using samples of permissions and information construction of frequent item sets the rules.Finally,the model line,and based on feedback from third party platforms continues to update the model.This model runs on a Linux-based server,the process is written in Python.Experimental results show that the effectiveness of this model. |