Font Size: a A A

The Research On Sequence Encoding And Prediction Algorithm Of Protein Subcellular Location

Posted on:2011-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2120360308968976Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The successful implementation of Human Genome Project, which lay the foundation for the determination of the whole human genome sequence,but when we want to know how all kinds of life activities are completed, we need to study the important part protein, which performs a variety of life activities.In addition, biological study shows that there is an intrinsic relationship between protein function and its subcellular location.The information of protein subcellular.location can provide useful clues for the research of protein function.Therefore,to further understand the function of protein, identifying the subcellular location becomes the important research area of proteomics.Focuses on the topic of prediction of protein subcellular location, the paper makes intensive studies on protein sequence encoding and designing of classification algorithms.The followings are main research achievements:1.The paper proposes a novel method for protein sequence encoding, according to physical and chemical properties of amino acid, we classify twenty kinds of amino acids into six categories.To calculate the local feature of protein sequence,we introduce the idea that each protein sequence is separated into parts with the same length.Then we use the support vector machine algorithm to predict the subcellular location of apoptosis protein,by testing on three different apoptosis protein datasets, the results show that the method proposed in the paper achieves better predictive performance.2.Basis on the position distribution information of the hydrophobic amino acids and other categories amino acids in protein sequence, the paper proposes a new pseudo amino acid composition for representation of protein sequence.Then based on diversity and difference of feature information of protein sequence,We construct an ensemble classifier by fusing diverse support vector machine classifiers.The ensemble classifier constructed in this paper, can combine multiple features information of protein sequence and decrease the uncertainty of individual classifier. We exploit the ensemble classifier to predict subcellular location of eukaryotic and prokaryotic protein sequence,the experiment results show that overall prediction accuracy on two datasets are significantly improved, comparing with other methods, this method also has obvious advantage.
Keywords/Search Tags:Subcellular location, Sequence encoding, Pseudo amino acid composition, Support vector machine, Ensemble classifier
PDF Full Text Request
Related items