Font Size: a A A

Research On Essential Genes Identification Based On DNA Sequence Features

Posted on:2020-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhaoFull Text:PDF
GTID:2370330572983707Subject:Control engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of biotechnology,the amount of available biological data is increasing exponentially.A large number of accurate biological data information can be obtained from multiple biological databases.How to analyze and mine these data accurately and efficiently,and extract the intrinsic information contained in it has also become a hot issue.Essential genes play an important role in maintaining normal life activities and reproduction process of organisms.The deletion of such genes will lead to death or loss of reproductive capacity of organisms,with serious consequences.In medical science,the essential genes play a fundamental role in the survival of organisms,making them become potential targets in many antibiotics and anticancer compounds.They are widely used in the elimination of pathogens and cancer cells,and are of great significance in the development of antibiotics and vaccines.In synthetic biology,the smallest genome of target cells can be selected to synthesize the "chassis" of living cells.In evolutionary biology,the study of essential genes can deepen the understanding of the evolutionary process of organisms,and through the study of common essential genes of the same kind,realize the analysis of species homology.However,the commonly used methods of screening essential genes through biological experiments have some disadvantages,such as high cost,long time-consuming,large workload,and small scope of application.In order to improve the efficiency of identifying essential genes,this research studied the essential gene identification algorithm based on the characteristics of DNA sequence from the perspective of bioinformatics,and proposed four kinds of effective classifiers to indentify target genes.Firstly,feature extraction of DNA primary sequence was carried out,which included three categories and 10 sub-categories of feature extraction methods i.e.k-mers and antisense complementary k-mers algorithm based on nucleotide composition,algorithms based on autocorrelation(DAC,DCC,DACC,TAC,TCC,TACC),and algorithms based on pseudonucleotide composition(PseDNC and PseKNC).Seven machine learning algorithms,including support vector machine(SVM),decision tree(DT),Random Forest(RF),Adaboost,k-Nearest Neighbor(k-NN),Logical Regression(LR)and Naive Bayesian(NB),were used to classify the extracted DNA sequence features.The results were analyzed and evaluated by seven performance evaluation indicators,namely.True Positive(TP)Rate,False Positive(FP)Rate,Precision,F-Measure,Matthews Coefficient(MCC)and Area under ROC Curve(AUC).In order to obtain better results,the integrated processing of the feature extraction method was performed.After parameter adjustment and optimization,four classifiers were obtained(RF-4-RF,LR-3-LR,KmerDAC-RF,and KmerDAC-LR classifiers).In order to validate the proposed classifiers,we used the E.coli essential genes from the PEC general database as the training data set.In the 10-fold cross validation,the features selected by RF-4-RF classifier were k-mers,RevcKmer,DAC and PSDNC feature sets,whose AUC value was 0.830,while the features selected by LR-3-LR classifier were DCC,DACC and TAC feature sets with AUC=0.834.The features selected by KmerDAC-RF and KmerDAC-LR classifiers were k-mers and DAC feature sets,and their AUC values were 0.827 and 0.799,respectively.In the comparison with the five state-of-the-art classifiers by indicators based on AUC,the classifiers proposed in this paper achieved better prediction performance.The accuracy,efficiency,and stability of these four classifiers indicate that the proposed classifiers are effective to identify essential genes and might be an effective tool in this field.
Keywords/Search Tags:Essential Genes, DNA Sequence Feature Extraction, Machine Learning, Computational Prediction
PDF Full Text Request
Related items