Font Size: a A A

A Novel Representation Of Protein Sequences

Posted on:2012-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:P P QianFull Text:PDF
GTID:2210330338464152Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Although Genomics is at the core of Bioinformatics research, studies about Proteomics are also very important in Bioinformatics. On one hand, as an important information of life, gene needs to transfer its information to protein, so protein is the real bearer of life activities; On the other hand, though Genomics provides the strong evidence for activities of gene and correlation of diseases, in fact, most diseases are not caused by changes in gene. Studies about Proteomics are not only can provide material basis for finding the laws of life activities, but also can provide theoretical basis and solution for learning the pathogenesis of diseases and overcoming these diseases. So, with the accomplishment of the Human Genome Project, scientists proposed the Subsequent Human Genome Project, and the research of Proteomics is a very important job in this project. Though the Human Genome Project has laid out the blueprint of the human body, if we want to understand fully about the complex human body, we need to learn more about all proteins produced by genes.In Proteomics, the main task of Bioinformatics is to analyze and predict the structure of proteins, and apply the structure knowledge in Bioinformatics, Medicine, Pharmacy and other life science fields. There is a corresponding relationship between the protein sequence and its structure, and this version has become a presupposition for predicting the structure of proteins. Based on this premise, researchers will calculate the evolutionary distances among different species by analyzing their protein sequences, and determine the distance relationships among these protein sequences, finally, they find the homologous proteins. Because homologous proteins have similar structure, we can predict the structure of a protein whose structure is unknown by analyzing the similarity between the protein and its homologous protein. However, analyzing protein sequences is more difficult than analyzing DNA sequences. One of the reasons is that a DNA sequence is consist of only 4 bases, but a protein sequence is consist of 20 different amino acids; and another important reason is that the relationships among the 20 amino acids are very complex. Therefore, assessing the similarity of protein sequences will be not easy. The key to solve this problem is how to get an effective representation of protein sequences. Building the mathematical model of protein sequences is a good way to solve this biological issue.The content of this paper can be divided into two parts: in part one, we propose a novel representation of protein sequences, and we examine the effectiveness of this representation; in part two, we apply this method in analyzing avian influenza virus sequences and predicting subcellular location of apoptosis proteins. Compared with previous analysis methods, the results show that our method is very effective. We will use four chapters to narrate the content of this paper. The Thesis contains the following works and contributions:In Chapter 1, we introduce the overview and prospect of Bioinformatics, and illustrate the research significance of the analysis of protein sequences.In Chapter 2, firstly, we give a new method to represent the protein sequences; secondly, we do two experiments to examine the effectiveness of this representation, and obtain the desired results.In Chapter 3, as an application, at first, we use this method to analyze the similarity of 123 H5N1 AIVs (Avian Influenza Virus) protein sequences, and compared with previous studies, we get the similar results. And then, we apply this representation in the prediction of the subcellular location of apoptosis proteins, and we also get the good results. All results show the important value of our method.In Chapter 4, we raise the defects of this article, and give the conclusion and prospect.
Keywords/Search Tags:Bioinformatics, Proteomics, protein sequence, structure of protein sequence, function of protein sequence
PDF Full Text Request
Related items