Font Size: a A A

Predicting Protein Subcellular Localization By Error-correcting Output Coding

Posted on:2014-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:L L GuoFull Text:PDF
GTID:2250330425981038Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The predicting protein subcellular localization is the process that judging the concretesubcellular location using primitive amino acid sequence, and the basis of predicting iscommon view of the biology which is that sequence determines structure and structuredetermines function; there is a close connection between the protein’s function and subcellularlocalization, so predicting localization needs to start from original sequence. It is greatsignificance to predict localization, not only understanding protein’s function, how proteinsperform biological functions and protein-protein interaction, but also knowing the mechanismof disease and developing new drog and thereby to achieve healing purpose. The humangenome project has produced large amounts of data, so there is an urgent need to study thesubcellular localization using bioinformatics method.This paper studied how to make forecasting model which to classify unknow sequencesaccurately. The main contents include selecting and constructing dataset, feature extraction ofamino acid sequence, design and analysis of predictive model. The focus of the study is thefeature extraction and model design.The main characteristics which relate to localization must be extracted beforeclassification, that is a step to transfer amino acid alphabet to digital information recognized bycomputer. It is the key to select extraction method; the information extracted by differentmethods is greatly different and commonly used methods include Amino acid compositionmodel, Two peptide composition model, Pseudo amino acid composition model and othermultiple features fusion. This paper adopts features fusion of the above single method andexperiments show that features fusion has significantly improved performance to a certaindegree.Protein subcellular localization is the typical multiple classification problems, andtreatment of this problem is converting it to two-classify problem, then they can be dealed withcommonly used classifiers, such as SVM, Artifical Neural Network (ANN), Flexible NeuralTree (FNT). The key is how to construct the classification model to deal with the multi-classproblem. This paper used Error Correcting output coding (ECOC) to classificy multi-labelsequences and base classifier are ANN and FNT; we have also improved ECOC in order to solve the imbalanced data problem and achieved good resutles.
Keywords/Search Tags:predicting subcellular localization, ECOC, feature extraction, ANN, ParticleSwarm Optimization (PSO), Flexible Neural Tree (FNT)
PDF Full Text Request
Related items