Font Size: a A A

Selection Of Training Set In Predicting Essential Genes And Target Identification In The Bacterial Pathogens

Posted on:2016-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhaoFull Text:PDF
GTID:2180330461466362Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
With the development of bioinformatics, genomics, statistics and other fields in recent years, and the accumulation of data in the databases related, and they all provide useful data resource for studying essential genes and indentifying the targets in pathogens using essential genes.Essential genes are those genes that are vital for the survival and development of organisms. It is very important to study essential genes, which can provide significant information for the theory and practice, so essential genes are always the research object in biology and related disciplines. We can predict essential genes in organisms not studied using known or confirmed essential genes, and the accuracy of the prediction is affected by many factors which including the selection of traning set. To improve the prdictive accuracy on essential genes, we studied on the selection of training set in the prediction of essential genes.Most of infectious diseases are caused by bacterial pathogens, and the fatality of infectious diseases caused by bacterial pathogens is also increasing, which cause the requirement for novel drug targets to increase. So it is very necessary to identify the potential drug targets in bacterial pathgens. Therefore, we identified drug targets in 9 bacterial pathogens by using bioinformatic method.Here we made research and discussion following the two parts.Part Ⅰ: we predicted and validated the essential genes between or among 21 species using a na?ve bayes classifier, and studied the effect of selecting training set on the predictive accuracy of predicting essential genes. The results showed that the selection of training set largely influenced the predictive accuracy. So we presented four criteria for selecting training set: 1) The essential genes used to select training set must be reliable; 2)The essential genes of the organisms in the training set or test set should be found in the same growth conditions; 3)The organisms in the training set and the target species should be closely related; 4)Organisms in the training set and test set should have the same phenotype and lifestyle. Then we analyzed the influence of uncomplete training set and integrated training set on the predictive accuracy. We found the size of the training set should account for 10% of full genomes at less, whose predictive accuracy would significantly increase. In addition, the integrated training set is better than the single training set in predictive stability and accuracy. At last, we compared the integrated training set based on the four criteria with the integrated training set selected randomly, and found the integrated training set based on the four criteria is apparently better in the predictive accuracy. So our research would provide some guidance in selecting training set for identifying essential genes.Part Ⅱ: We collected the essential genes and homologous genes in human and nine bacterial pathgens and compared them, and we identified four broad-spectrum drug targets in nine bacterial pathogens and their corresponding small-molecular inhibitors. In the process of selecting drug target, based on the existing related research on the identification of drug targets, we presented three criteria on identifying broad-spectrum drug target: 1) Genes should be the common homologous genes in pathogens studied; 2) Genes should be essential genes, which are vital for the pathogens’ survival; 3) Genes don’t have homologous genes in human, avoiding toxic and side effects on human. Our research would provide useful information for drug design.
Keywords/Search Tags:essential gene, training set, drug target
PDF Full Text Request
Related items