Drug research and development is a complex work,how to use efficient and convenient methods to obtain reliable efficacy of drugs has always been a difficult problem,the traditional efficiency of drug testing has not adapted to the rapidly changing social needs,and will cause a huge waste.With the progress of biomedicine in recent years a large number of protein and small molecular receptors have been developed to be collected and a database,and mature with the development of computer technology in the use of machine learning and deep learning methods for drug screening technology,these emerging technologies can predict small molecular targeted drug and protein,the affinity between the discrimination between molecules and target protein is active.Machine learning and deep learning techniques can be used to screen drugs more effectively and accurately.In this paper,compounds and protein test benchmark data sets in Directory of Useful Decoys were tested,molecular activity was distinguished by machine learning and deep learning methods,and molecular features were processed by digitization to extract characteristic information in the data.The main contents are as follows1.Explain the meaning of virtual screening and the relevant research on drug screening by machine learning technology in recent years,and introduce in detail the theoretical contents of k-nearest neighbor algorithm,support vector machine algorithm and artificial neural network.2.The feature composition and classification methods of proteins are introduced,and the protein model is constructed by means of coding on traditional machine learning.To DUD drug library related targets as the benchmark datasets,dimension reduction combined with principal component analysis method to reduce the dimension of the sample data,using KNN and SVM algorithm of machine learning parameters optimization and establish the classification model of virtual screening,3 heavy cross validation method is used to get the ace model of classification accuracy,finally get the SVM and KNN classification accuracy of 73.3%and 67.9%respectively,the ROC curve(Receiver Operating Characteristic)area under 0.81 and 0.68 respectively.3.Combined with the deep learning method is proposed for virtual screening,build learning network composed of encoder characteristics,using the word embedding method of learning how to extract the features from a given data set,to obtain the ligand atoms and residue type characteristic analysis,fast identification of active and inactive,K fold the verification method to be used for the ace training test protein model,precision as high as 0.97.Finally,the interaction between the bait compound and the molecular protein was verified by analyzing the enrichment factor and AUC value.Experiments show that the virtual screening method established by deep neural network is the most effective compared with K nearest neighbor and support vector machine. |