Research And Implementation Of The Self Training Algorithm Of Bacterial Essential Genes

Posted on:2017-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:S Luo

Full Text:PDF

GTID:2180330485486563

Subject:Biophysics

Abstract/Summary:

PDF Full Text Request

Essential gene in the organism is very important,the functiona of thegene is vital to sustain life.There are two methods to predict and discover essential genes.The first one is experimental method,such as RNA interference and single gene knockout.However,the method is time-consuming and expensive.Owed to the drawbacks and limitations, the experimental method become less applicable for large scale gene essentiality analysis to date.Computational method,offer an appealing alternative for predicting essential genes with reasonable or minimum expenditure of resources than the experimental counterparts.Most computational methods used to date work in integration,this method is very dependent on experimental data,in the absence of experimental data is difficult to predict bacterial essential genes. In order to get rid of such limitations, we decided to develop a gene essentiality prediction algorithm based on the characteristics of the genes.First,we chose the protein domain as a feature of the prediction algorithm.Through the experimental verification,we found that the protein domain plays an indispensable role in the prediction of essential genes.Then we chose 25 species as the datasets,and use the genetic distance relate to the protein domain of the different species.A essential gene prediction algorithm was designed.Through to datasets of multiple cross validation and the AUC values were calculated.Finally, from 25 species we have chosen, 5 species are more than 0.9,and 14 species are between 0.75 and 0.9,6 species are less than 0.75,the lowest value is 0.66.It shows that our algorithm is very good.Next,we upgrade the essential gene prediction tool—Geptop,which is based on the features of the gene sequence.Compared with the older version,the new one has the following improvements.(1) the datasets are extended from 19 species to the 25.(2) simplify the scoring formula,so that it is easy to understand.(3) optimize the prediction program to improve efficiency.By upgrading,the prediction accuracy of Geptop has been improved.Compared with the older version,there are 12 species were increased in datasets.About running efficiency,we use E.coli to test our program and the time reduced from 107 minutes to 26 minutes,the efficiency increased by nearly 4 times.Finally,we try to combine the essential gene prediction method based on protein domain and Geptop to get better results.Because of the limited time,we didn’t find a way to improve the prediction results.But we can provide our experience to other scholars.

Keywords/Search Tags:

bacteria, essential genes, protein domain

PDF Full Text Request

Related items

1	Identifying Essential Proteins Based On Domain Information
2	The Research Of Computational Methods On Theoretical Identification Of Essential Genes
3	Comparetive Analysis Of Essential And Nonessential Genes In Escherichia Coli K-12
4	Identifying Protein Complexes And Predicting Disease Genes Based On Protein Domain
5	Research On Identification Of Essential Genes And Prognostic Gene Signatures
6	Research On The Relationship Between Topological Structure And Functional Lethality In Protein-protein Interaction Network
7	Structural and biochemical analysis of the essential spliceosomal protein Prp8
8	Identification Of Bacterial Essential Genes And Analysis Of Evolutionary Characteristics
9	Identify Essential Protein And Protein Complex Algorithms On Protein-proteinInteraction Networks
10	Research On Essential Genes Recognition Based On Sequence Information