Font Size: a A A

Bioinformatics Research On Disease-related Amino Acid Variations

Posted on:2016-05-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:1224330482465832Subject:Systems Biology
Abstract/Summary:PDF Full Text Request
Diseases are related to environmental and genetic factors. A lot of amino acid variations have been found in patients of complex diseases in the past years. With the development of precise medical which is characterized by personalized medicine, it is more important to study the variations that may affect the functions and cause diseases. Thus the effects on protein structures and functions of amino acid variations should be analyzed firstly. Comparing to the experimental methods, bioinformatics methods based on Computing models, e.g. Machine Learning algorithms, have advantages in decreasing both time and economics costs.The thesis includes three mian aspects:(1) the research and design of amino acid variation analyzing methods;(2) algorithms optimization and software developing; and(3) application of such methods and tools in specific diseases.For methods research and design, we firstly identified three types of protein conservation based on Multiple Sequence Alignments(MSA) and information theory. Different grouping methods of amino acids, like that according to physical-chemical properties were introduced to the algorithm. It can calculate not only the conservation of single variant residue, but also the co-evolution mutual and triplet residues. Then we used some structural properties like contact energy change(dCE) calculated by coarse-grained model as input features to improve the accuracy of protein structural stability predictor. For predicting the effect of amino acid substitutions on protein solubility, we collected from literature the largest reported solubility affecting amino acid variation dataset, and used it to train a predictor called PON-Sol with 2- layer Random Forests after feature selection. Instead of two classes, it can distinguish both solubility decreasing and increasing variants from those not affecting solubility. It has higher correct prediction ratio for independent test set comparing with other methods.For algorithms optimization and software developing, we implemented the three-type conservation algorithm into a GUI-based software called ProCon using java language. The detailed algorithm was optimized and the tool’s functions include MSA analysis, conservation calculation, statistics of distribution of co-evolution mutual and triplet residues, as well as the visualization in corresponding 3-D protein structure. PON-Sol, which was constructed as an online webservice in Django platform with R packages, can predict effects on protein solubility changes for batched variations from different proteins, or for all possible variations in a specific protein.The variations related to Neurodegenerative diseases(NDDs) were chosen to be analyzed as an application of the above methods and tools. Mutation information data for all reported NDDs were manually extracted and stored as a database in LOVD 3.0 platform. It contains over 4600 variations related to 37 individual NDDs from about 1800 papers in PubMed. We analyzed more than 200 amino acid variations from 3 NDDs and other 33 variations related to multiple NDDs, by calculating the conservation properties and predicting their effects on protein structures and solubilities, respectively. And some variations were suggested for further study.This project is a useful exploration for the systematical research on the effects of disease-related amino acid variations. The analysis models and algorithms achieved good performance and all the datasets and tools are free available for all researchers. It should be beneficial to the further research on complex diseases.
Keywords/Search Tags:amino acid variation, protein conservation of amino acid residue, protein structure, proitein solubility, Neurodegenerative diseases
PDF Full Text Request
Related items