Font Size: a A A

Predicting Properties Of Compound Based On Graph Mining

Posted on:2019-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:X D WangFull Text:PDF
GTID:2371330569486822Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the field of artificial intelligence,there are some application,which through the computer to enhance the efficiency of huge information processing.The advantages of AI combines the experience of intelligent chemist,making the number of synthetic compounds sharp increase.How to fast and efficiently differentiate the unknown compounds' properties has become a very important problem in reality.The main contents of the predictive compounds' properties is analyse the known compounds,to find out the rules for the biochemical activity classification of unknown compounds.It can be described as a classification model is constructed by learning the training set in machine learning.In the construction of the compound properties prediction model,if the atoms as vertices,the interaction between the atoms as sides,it will process compound structure as graph data,and then translate the compound classification problem into graph classification problem;Because the negative samples is expensive to construct by biochemical tests,only One Class in the sample set is determined,corresponding classification scene is One Class Classification scene.Therefore,in this paper,the classification of compounds actually can be summarized as One Class Graph classification problem.To solve this problem,this paper proposes the prediction method of compound properties based on graph mining,the main results are as follows:(1)Extraction of compounds feature subgraphs.Using the frequent subgraph mining algorithm to extract feature subgraph from compound graph,AC-gSpan algorithm(Adaptive CloseGraph-Based Substructure Pattern Mining)is proposed to solve this problem,the traditional gSpan(Graph-Based Substructure Pattern Mining)algorithm need to know the upper bound of support parameters for compound graph to mine the frequent subgraph mining,there is a lot of repeated information expression in the frequent subgraph.AC-gSpan use the initial frequent degree of one side,setting mining stage to solve the problem of prior support parameters,and mining the closed frequent subgraph to reduce redundant subgraphs.The experimental results show that the adaptive closed frequent subgraph mining algorithm can significantly improve the efficiency of mining,mining the characteristic subgraph can effectively represent the compound graph data.(2)Classification and analysis of the properties of compounds.Constructing the predictive classification model by the One Class ensemble classification method,OC-Adaboost(One Class Adaptive Boosting algorithm)algorithm is proposed to solve the classification problem of compound prediction,which is one class ensemble classification method based on Adaboost.For one class classification problem is learning a kind of training set to construct a description model for such data,the problem is sensitive to the selection of parameters,and according to the description of the categories of data shows good recall performance but wrong accuracy and stability.While OC-Adaboost can reduce the dependence of the tuning parameter selection to the performance of the classifier,according to the method of returned sampling to weaken the influence of one class base classifier for the whole,to enhance the integration effect and generalization ability.The experiments indicate that the OC-Adaboost classification method has better accuracy and generalization.
Keywords/Search Tags:graph classification, frequent subgraph mining, feature selection, one-class classification, Adaboost
PDF Full Text Request
Related items