Font Size: a A A

Protein Sequence Coding And Protein Functional Class Prediction

Posted on:2012-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q F LiuFull Text:PDF
GTID:2230330395985661Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
From the Human Genome Project had been carried out, many kinds of sequencebioinformation is exploding every year. We are in the era of post-genome for genomeresearch. The data of the sequences of nucleic acid and amino acid is increasingconstantly, the function of protein which takes part in life activity is little known. Sothere is one of the main masks for protein function prediction. Because of vastamounts of sequent data, the traditional experimental method can’t meet therequirement of sequence analysis. Thus the problems of amino acid selection, analysis,processing and annotation also become the hotspot and difficulties nowadays. Whatthis paper mainly discusses is the problem of the prediction of protein functional class.The main work of this thesis is summarized as follows:This paper proposes an approach of data selection for protein functional classprediction. The selection of train dataset is an essential process of protein functionalclass prediction. People usually choose the train dataset as large as possible. We didsome research in this paper. Firstly we reorder all protein sequences accoding to everyprotein sequence length. In accordance with the order of new reorder of sequences weget a series of samples dataset of the same size, and predict protein function. From theprediction of these samples, we know that sequence length and character exert aninfluence on the prediction of protein function. So we present an approach that wechoose the nearest protein from samples as train dataset, which amount we get issmall, based on the length of protein test sequence. We compare it with commonmethod on different train dataset grounded on Profile Coding and NNA Classifier. Thepredicton rate is almost the same.The result of expriment turn out this method of traindataset is feasible.This paper also presents a protein functional class prediction based on Clustering.In order to extract protein sequence more character information, we argue andcompare two coding. Then we chose the ProfileAA coding, which integrate aminoacid composition information and amino acid physical and chemical propertyinformation. Furthermore, comparing the coding with the other three coding, wefound that this coding is reasonable. Moreover, we predict protein function classbased on Shortest Path Clustering. In order to evaluate this method, we predict proteinfunction class with two measures, one with Shortest Path Clustering and the one without clustering. And again, we compare our protein functional class method basedon clustering with other authors’ methods. Finally, the result of experimentationindicates the prediction rate of protein functional class is higher based on ShortestPath Clustering.
Keywords/Search Tags:Anomic sequence, ProfileAA coding, Protein functional classprediction, Nearest neighbor algorithm, Shortest Path algorithm
PDF Full Text Request
Related items