Font Size: a A A

Research On Predicrion And Functional Analysis Method Of Intrincally Disordered Proteins

Posted on:2014-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:1261330422490316Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
“Sequenceâ†'Structureâ†'Function” is the traditional view that amino acid sequences determine the structure of a protein molecule and that a definite protein structure is a prerequisite to biological function. This view has been amended by the finding that more and more proteins possess no definite ordered three-dimensional structure but are still involved in key biological processes, including cell cycle and gene regulation, molecular recognition, assembly of complexes, and signaling in general. Indeed, over33%of eukaryotic proteins contain structure-lacking regions. This kind of protein is often named “intrinsically disordered proteins”(IDPs). Several studies have shown a strong correlation between disease-associated proteins and proteins containing significant amounts of intrinsic disorder, leading to the D2concept of “disorder in disease”. Complex diseases such as cancer, neurodegenerative diseases, cardiovascular diseases, and diabetes are often associated with IDPs.With the development and improvement of proteomics, unfolded proteomics, based on intrinsically disordered proteins, as part of a proteomics have attracted attentions and became an active research area. This thesis attempted to focus on the prediction of intrinsically disordered regions, IDPs functions prediction and IDPs’s potential role in the protein-protein interaction networks or genetic elements. The main contents are as follows:(1) A Bayesian decision tree model based on the latent variable was proposed to predict intrinsically disordered regions.To incorporate errors introduced by experiments for predicting IDPs, we designed the observation variable and true variable, which is also the latent variable, for the Bayesian model separately. Then a latent variable based Bayesian decision tree classifier was proposed, which was inferred by Markov chain Monte Carlo using Metropolis-Hastings. A conducted simulation study had shown that the ability of the proposed model in identifying false positives and false negatives was satisfactory. A better performance of predicting intrinsically disordered regions was also achieved.(2) Mostly IDPs function predictionSince order protein function prediction is quite different from mostly IDPs, a new method used for IDPs function prediction had been proposed. The annotated protein dataset was constructed based on predicted data in protein database. Then two types of features were designed based on the distance of adjacent amino acid residues. Moreover, the latent semantic analysis was also for optimization feature space to train the support vector machine classifier. Evaluations on the Disprot database showed the reliability of the method.(3) Characterization of roles of IDPs based on human protein-protein interaction network with reliable confidence.The overall goal was to establish a reliable human protein-protein interaction network and characterize roles of IDPs in the context of the network topology. Various experimental data, including low-throughput and high-throughput, and predicted data were integrated to infer human protein-protein interaction network. Due to the limited experimental data sets available for human, we borrowed experimental data sets from other organisms and translate protein-protein interactions in these organisms to those in human. The inference of the interacting probability involved a large number of latent variables. The combinatorial effects made it impractical to compute the expectation of the missing variables analytically during the E-step. We considered Monte Carlo EM to tackle this difficulty. Then we characterized the roles of IDPs by accounting for local/global network topology and the confidence of protein-protein interactions.(4) To infer the relationship between the single nucleotide variation and disease based on changes in disorder prediction scores.Using the exonic single nucleotide variations identified in the1,000Genomes Project and distributed by the Genetic Analysis Workshop17(GAW17), we systematically analyzed predicted disorder potential features of the non-synonymous variations, especially mutation-induced changes in disorder prediction scores called DS. Context-dependence of DS values was detected in different situations. Then we focused our analysis on the relationship between DS and Minor Allele Frequency. After that, the effect of disorder/structured potential single nucleotide variations to protein and disease was evaluated. The result of experiments suggests that a significant change in the tendency of a protein region to be structured or disordered caused by SNVs may lead to malfunction of such a protein and contribute to disease risk.
Keywords/Search Tags:Intrinsically disorder, Latent variable model, Function prediction, Protein-protein interaction networks, Single nucleotide variation
PDF Full Text Request
Related items