Font Size: a A A

Classification Of Ordered/disordered Regions Of Intrinsically Disordered Proteins Based On DNA Sequence Analysis

Posted on:2017-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:T X SuiFull Text:PDF
GTID:2310330482490327Subject:Microbiology
Abstract/Summary:PDF Full Text Request
Intrinsically disordered proteins(IDPs), are widespread in natural proteins, which lack stable three-dimensional structure. IDPs have important biological functions in a variety of physiological and pathological processes. Due to the flexible structure,experimental studies of IDPs are very difficult. At present, many biological mechanisms of IDPs are still not clear. The development of prediction algorithms based on bioinformatics methods provide important tools for the research of IDPs. Researchers have developed a number of IDPs prediction algorithms since the IDPs are found, but the efficiencies of most prediction algorithms are not high, and the results of different prediction algorithms have large differences. So more comprehensive and deeper digging of IDPs sequence characteristics are needed. The existing IDPs prediction algorithms are all based on protein sequence features. In order to reveal the deeper sequence characteristics of IDPs, the characteristics differences of DNA sequence of ordered and disordered regions were studied systematically in the present study. And on this basis,prediction algorithm for the classification of ordered/disordered regions of IDPs was developed, providing a new idea for the future prediction. The main work of this thesis was as follows:1. Sequences analysis of ordered/disordered regions of IDPs at DNA levelBased on the annotation information in the latest version of IDPs database DisProt 6.02,the original genomic DNA sequences of IDPs were obtained from the EMBL Nucleotide database.After eliminating redundancy and other processes, the final dataset containing1063 ordered regions and 499 disordered regions was built. Based on this dataset, the differences in characteristics of ordered and disordered regions sequences were deeply studied using CGR(Chaos Game Representation) model, codon bias and other methods.The results showed that the ordered and disordered regions had characteristics differencesin various degrees and certain preferences in single nucleotide, dinucleotides, codon usage and other aspects. Further analysis showed that disordered regions had a wide range of codon usage, but ordered regions had similar codon usage. The differential analysis of relative GC content and content of the purine and pyrimidine at each codon site showed that ordered and disordered regions had characteristic differences. In addition,in this thesis we also analyzed the connected areas between ordered and disordered regions. The result showed that disordered regions had a relatively conserved sequence features. Therefore, the above studies showed that the IDPs ordered and disordered regions had different degrees of sequence characteristics at DNA level, laying a theoretical basis for IDPs prediction.2. IDPs ordered/disordered regions classification algorithm based on protein andDNA sequence characteristicsIn ordered to quantitatively describe the characteristic differences of IDPs ordered and disordered regions, we selected three characteristics parameters as input parameters of IDPs classification algorithm: the frequency of single nucleotide, dinucleotides,trinucleotides based on DNA sequence; the second introduced features based on 75-D vector proposing from TN curve and Z curve to describe the composition and arrangement information of trinucleotides; the third feature selected the frequencies of 20 amino acids and 400 dipeptides commonly used in IDPs prediction algorithms. Then, in combination with support vector machine(SVM), we developed the classification algorithm for IDPs ordered/disordered regions. The prediction results of different datasets indicated that the 75-D vector based on TN curve and Z curve could well present characteristic differences of ordered/disordered regions. Further integrating protein and DNA sequence characteristics can provide new ideas for the IDPs prediction and related researches in the future.
Keywords/Search Tags:Intrinsically disordered proteins, DNA sequence, SVM, TN-Z curve
PDF Full Text Request
Related items