Font Size: a A A

Study On Feature Extraction And Prediction Algorithm For Subcellular Localization Of Gram-positive Bacterial Protein

Posted on:2021-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:X W GaoFull Text:PDF
GTID:2370330605973565Subject:Biophysics
Abstract/Summary:PDF Full Text Request
There are all kinds of bacteria in our life.The bacteria that turn purple when treated with Gram-stain are Gram-positive,while those that turn red are Gram-negative,Identification of acid,resistant by zeni acid-fast staining is referred to as mycobacterium tuberculosis.In daily life,many cases are caused by Gram-positive bacteria and mycobacterium tuberculosis,so it is of great significance to study the localization of Gram-positive bacteria and mycobacterium tuberculosis subcellular proteins to treat various diseases.In this paper,the latest UniProtKB/Swiss-prot protein database was used to establish the dataset of Gram-positive bacterial protein with less than 0.25 similarity.The dataset contained four position protein:cell wall,cell membrane,cytoplasm and extracellular.The Gram-positive bacterial protein subcellular localization were predicted in this dataset.Firstly,the domain information of each type of Gram-positive bacteria protein was extracted,the structure and function of these domains were analyzed.The amino acid composition information(AAC),amino acid dipeptide composition information(DC),hydropathy dipeptides composition information(hpDC),gene ontology annotations(GO),average chemical shift(acACS)and domain information(DI)were selected as feature parameters and the Gram-positive bacteria protein subcellular location were predicted by using support vector machine(SVM).Among the single feature information,the prediction result of amino acid compsition information was the best,and the overall accuracy was 74.6%.The combined features were higher than the single feature.The prediction result of AAC+DC+hpDC was 86.1%.In this paper,the mycobacterium tuberculosis were predicted also.The mycobacterium tuberculosis dataset constructed by Fan and Li in 2012 was updated.And 435 mycobacterium tuberculosis protein sequences in four categories were obtained,which called N435.Then the four features were extracted.In the prediction of single feature information,the prediction rate of gene ontology reached 81.1%.The results showed that the different combined feature parameters were better than the single feature prediction,and the total predicted accuracy was 87.6%under the test of Jackknife based on the support vector machine algorithm.
Keywords/Search Tags:Subcellular localization, Gram-positive bacteria, Mycobacterium tuberculosis, Protein sequence, Feature extraction, Support vector machine
PDF Full Text Request
Related items