Font Size: a A A

Prediction Of Effects Associated With Single-point Protein Mutations And Study Of Mutation Databases

Posted on:2011-09-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:S GaoFull Text:PDF
GTID:1100330332972678Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Mutations are changes in the DNA sequences and include many types, including substitutions, Insertions&Deletions, amplifications, etc. Mutations can be roughly classified into natural mutations, random induced mutations and site-directed induced mutations. In forward genetics, researchers start with a mutation phenotype from natural mutations or random mutagenesis experiments and work toward identifying the mutated genes. In reverse genetics, based on the large scale of genome sequencing, researchers can use site-directed mutagenesis experiments to study functions of genes or elements on DNA sequences, structures of RNA or proteins or other properties. Mutagenesis experiments play an indispensable role in biological basic researches (e.g. the investigation of protein structure—function relationship, identification of DNA—protein interaction sites) and applications (e.g. drug design, gene therapy).The dramatically accumulated molecular biology data from mutagenesis experiments have made it possible to systematically study mutation problems by bioinformatics methods. To facilitate such studies, a number of mutation databases have been developed. However, the heterogeneity of those databases makes it difficult to submit, exchange, and use mutation data. The Human Varisom Project (HVP) has been initiated to provide unified, standardized and high quality mutation data. This brought the issue of the integration and standardization of existing mutation databases.Data mining and knowledge discovery based on mutation databases is another class of important tasks in HVP. Among those tasks, one of the most significant challenges is to predict the effects of protein point substitutions (mutations). The result of prediction can be used to guide biological experiments directly. Moreover, this kind of research laid a foundation for further study in related biological problems (e.g. studies of protein functions).The research work presented in this dissertation includes two parts. In the first part (chapter 2), HVP and its development will be introduced first. We will then address some problems related to the integration and standardization of mutation databases. We will propose the Hierarchical Entity-Relation Graph (HERG) Model, which can be used to depict published molecular biology databases graphically. The HERG model can also be extended into a basic model in a unified framework for standardizing the heterogeneous databases. In the second part (chapter 3), we will report a novel substitution-matrix based kernel for support vector machine (SVM) and its application in predicting the effects of protein point substitutions (mutations). We will demonstrate the advantages of the new kernel over classical SVM kernels, based on a large dataset extracted from Protein Mutant Database (PMD) dadabae. We will conclude this part with discussion of the meaning of substitution-matrix based kernel functions using information theories.
Keywords/Search Tags:mutation, prediction, SVMs, substitution-matrix, human variome, data model, standardization
PDF Full Text Request
Related items