Font Size: a A A

Data Mining Applied In Molecular Phylogeny And Quantitative Structure-Activity Relationship Modeling

Posted on:2009-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:C J WangFull Text:PDF
GTID:2120360272995584Subject:Agricultural Entomology and Pest Control
Abstract/Summary:PDF Full Text Request
The explosive growth in stored or transient data has generated an urgent need for new techniques and automated tools that can intelligently assist us in transforming the vast amounts of data into useful information and knowledge. Data mining is defined as uncovering meaningful, previously unknown information from a mass of data. It is an emerging field since mid 1990's boosted by the flood of data on the Internet. Data mining is a multidisciplinary field, drawing work from areas including database technology, machine learning, statistics, pattern recognition, information retrieval, neural networks, knowledge-based systems, artificial intelligence, high-performance computing, and data visualization. Generally, data mining techniques deal with three major problems, i.e., association, classification and prediction, and clustering.The research mainly includes two part. It, a novel alignment-free method, Multiscale association (MSA), was proposed, and applied to analyse the phylogenetic relationships of coronaviruses and section Holometabola of Insecta. The phylogenetic tree constructed based on MSA is consistent with those of previous analyses.In the second part, based on support vector regression (SVR), k-nearest neighbor (KNN) and combinatorial prediction, a nonlinear prediction approach, named SVR-KNN, was developed and applied to the quantitative structure-activity relationship (QSAR) on the antibacterial bioactivities of 48 cephalosporin compounds against Haemophilus influenzae. The results manifest that SVR-KNN has strong prediction ability and outstanding generalization ability, and it is expected to be widely used in QSAR of other compounds.
Keywords/Search Tags:data mining, phylogeny, quantitative structure-activity relationship, support vector regression, k-nearest neighbor
PDF Full Text Request
Related items