Font Size: a A A

The Research Of Computational Methods On Theoretical Identification Of Essential Genes

Posted on:2019-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:H L HuaFull Text:PDF
GTID:2310330563454135Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Investigation of essential genes is of great importance for learning the minimal gene sets,discovering the potential drug targets and designing the broad-spectrum antibiotics.Because of the irreplaceable functions of essential genes for species survival and reproduction,many researchers have made their best to explore new methods to identify these genes.From traditional wet-lab techniques to the theoretical research methods,identification of essential genes is still a keystone.Our group have provided three online research servers to predict essential genes,and they are Geptop that designed based on othologous-essential genes and evolutionary distance,CEG-Match that developed according to gene names,pheg that built to predict essential genes for human based on the sequence composition information in DNA sequences.Enlightened by Geptop algorithm,we adopted the othologous-essential genes as features,and combined them with support vector machine(SVM)to train the prediction model.Feature weight coefficients given by SVM would replace the original weight of evolutionary distances in Geptop.For the essential gene prediction in interspecies,we acquired the highest AUC(area under the ROC curve)value of 0.9716 in our selected 25 species by 10-fold cross-validation.On the purpose of predicting essential genes in distantly related bacteria,we chose the species which has the AUC value higher than 0.90 and is the most closely related with the target species to train prediction model.Through comparing the consistency between prediction results and the known gene essentiality of target species,we obtained the highest AUC score of 0.9552 and the average AUC score of 0.8314.It has been the pretty good results in the present research of essential genes.Identification of bacterial essential genes is helpful for discoverying potential drug targets.However,the significant antibacterials should not only restrain the pathogen,but also have no toxic side effects for human body.That is to say,these antibacterials cannot interact with human essential genes.Therefore,identification of human essential genes is nessary.We downloaded the genome data and gene essentiality of human cancer cell lines.We extracted three types of features from protein-protein interaction network,gene expression profile,GO function annotation file to analyze essential genes and non-essential genes.The algorithm of support vector machine was used to build the prediction model.Each type of features was performed to make prediction,and AUC value of 0.8624,0.8272 and 0.8706 were acquired,respectively.Absolutely,each of them achieved quite great predicton results despite more missing values existed in the later two features.Additionally,we constructed an integrated prediction model through combining all of these features.Finally,we got an AUC value of 0.9401 via 10-fold cross-validation,which was the best prediction results until now.This paper focuses on the theoretical identification of essential genes of bacteria and human cancer cell lines.According to the respective research status,different biological characteristics were extracted to construct the prediction model,and both of them obtained quite great predictions.Even so,these methods still need to be researched deeply,and then can provide the service tool that can be widely applied in the field of identification of essential genes.
Keywords/Search Tags:essential genes, bacteria, human cancer cell lines, biological characteristics, machine learning methods
PDF Full Text Request
Related items