Font Size: a A A

The Building Of Apoptosis Protein Database And Prediction Of Subcellular Location

Posted on:2015-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:M L SongFull Text:PDF
GTID:2180330467466062Subject:Physical Electronics
Abstract/Summary:PDF Full Text Request
At the end of the20th century, more and more of the protein sequence and DNAsequence into the public database,human society is beginning to move into the post-geneera. Bioinformatics as the core technology of the post-gene era, the purpose is to analyzeand interpret the information about the structure and function contained in thesequence,however,the function of the protein is closely related to its subcellularlocalization, so the protein subcellular localization prediction has become the keyresearch content in the post-gene era. In this paper, we study the apoptosis proteinsubcellular localization prediction problem, apoptosis protein is a kind of protein that isassociated with many diseases, apoptosis protein subcellular localization information isadvantageous to the mechanism of apoptosis and apoptosis protein function research, atthe same time contribute to the development of new drugs and understand the mechanismof disease.For apoptosis protein subcellular localization prediction, we should establish proteindata set according to difference subcellular location of protein, this is the first step ofapoptosis protein subcellular localization prediction work. The protein sequence featureextraction and classification algorithm are the key issues in the protein subcellularlocation prediction.This paper is based on UniPortKB/Swiss-Port database establishedsix kinds of eukaryotic apoptosis protein data set apoptosis-887;this paper proposes anew protein sequence feature extraction method, based on the site amino acids frequencydistribution information of N-teminal and C-teminal and the most contiguous dipeptidefrequency of physical and chemical characteristics;respectively adopt diversityincrement method, the support vector machine method, the increment of diversitycombining support vector machine method and combined classifier method for apoptosisprotein dataset apoptosis-887to predict the subcellular location, under5-fold test, thetotal forecast accuracy of each classifier were68.77%,75.87%,76.44%and79.26%, itcan be seen that combined classifier achieved good prediction effect than singleclassifier.Research shows that:(1) From the primary sequence of apoptotic protein, usingmultidimensional combination of features to characterize the apoptotic protein sequence, a variety of parameters characteristic feature extraction integration strategy can improvethe prediction accuracy.(2)The site amino acid frequency distribution information ofN-teminal and C-teminal are the main characteristic parameters of apoptosis protein.(3)Combined classifier can merge the advantages of each classifier and reduce thevariance caused by the conflict, effectively improving the recognition accuracy.
Keywords/Search Tags:Eukarya Cells, Apoptosis Protein, Subcellular Location, Single Classifier, Combined Classifier
PDF Full Text Request
Related items