Font Size: a A A

Research On Several Problems For Protein And RNA

Posted on:2012-07-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:S L ZhangFull Text:PDF
GTID:1100330335954644Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the further implementation of the human genome project, the accumulative rate of biological data is accelerating. So, it makes newer and higher requirements for scientific analyt-ical methods and practical tools of biological data. The deluge of databases, in turn, produces new questions such as how to analyze, process and store these data, which are serious chal-lenges to computer sciences, mathematics and other subjects. Meanwhile, many researchers who study sciences are attracted by these challenges and get interested in life sciences. Thus, bioinformatics emerges as a new and developing interdiscipline. The research area of bioinfor-matics is very wide, this dissertation mainly studies several problems of two important research objects-Protein and RNA in this area, the main results can be summarized as follows:1. In chapter 2, the amino acid sequence was first mapped into hydrophobicity sequence, and then processed by discrete Fourier transform and continuous wavelet transform so that we extracted the features of protein secondary structure from the hydrophobicity sequence. Especially using the continuous wavelet transform to pre-extract the high frequency part, we obtained the periodic features of the amino acid sequences effectively. Finally, online software platform, making the research results can be widely applied.2. In chapter 3, for protein secondary structure comparison, the transition probability matrix and structural characteristic vectors of proteins were constructed. Then the FDOD score scheme was developed to measure the similarity. Compared with the traditional alignment-based method, this new approach showed more reasonable protein structural class. For protein secondary structure class prediction,11 features were utilized to reflect the general contents and spatial arrangements of the secondary structural elements of a given protein sequence, three of the features were specially designed to improve the prediction accuracies for proteins fromα/βandα+βclasses.3. In chapter 4, two novel approaches for phylogenetic analysis of protein sequences were proposed. In the first approach, we took the physicochemical properties of amino acids into account and introduced the protein feature sequences into phylogenetic analysis by using condi-tional LZ complexity. Tests on the real datasets illustrated that we can capture the evolutionary information in the protein feature sequences. In the second approach, based on the protein fea-ture sequences, we constructed the characteristic vectors and revealed the evolutionary relation-ship by using the Bhattacharyya distance. Therefore, our method may be used to complement phylogenetic analysis.4. In chapter 5, a complexity-based method to compare RNA secondary structures (pseu- doknots were taken into account) was proposed. By incorporating the information on base pairing into primary sequences, we transformed the complex RNA secondary structures into the linear characteristic sequences, then compared the secondary structures by using conditional LZ complexity. Finally, the similarity analysis and phylogenetic analysis for two sets of RNA secondary structures were presented.
Keywords/Search Tags:Protein secondary structure comparison, Protein secondary structure class prediction, RNA secondary structure comparison, Power spectrum density, Sequence complexity, Phylogenetic tree
PDF Full Text Request
Related items