Font Size: a A A

The Research Of Viral Protein Coding Genes Identification System

Posted on:2020-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z GaoFull Text:PDF
GTID:2370330596475253Subject:Biophysics
Abstract/Summary:PDF Full Text Request
In-depth research on the viral genome has been instrumental in many aspects,especially in the treatment of human diseases which are caused by viral infections.With the rapid accumulation of viral sequencing data,more efficient gene recognition systems are needed to process and mine these data.In this work,we introduce Vgas,a system that combines ab initio recognition method and a similarity-based approach to automatically find viral genes and implement gene function annotations.By testing 5,705 viral genomes downloaded from RefSeq,compared with the existing programs Prodigal,GeneMarkS and Glimmer,Vgas proved its superiority,with the highest average accuracy and recall rate,especially for the small virus genome(? 10 kb),which showed significant performance(accuracy is 6% higher and recall is 2% higher).In addition,Vgas provides an annotation function to provide functional information on predicted genes based on BLASTp alignment,and we have also identified 86 genes that refseq database missed.In addition,tests have shown that when Vgas is used in combination with GeneMarkS and Prodigal,better prediction results can be obtained than each of the three separate programs.Collaborative prediction using these different software programs will be better for genetic prediction.The program is now available at http://cefg.uestc.cn/vgas/ for free.However,Vgas dose not perform well on phage and dsDNA species.Considering this limitation,we try to use deep learning method to achieve virus gene finding.This experiment applied convolutional neural network and designed a total of 8 layers of network structure.The five-fold cross-validation on the constructed data set from Uniprot database was tested,and the F value reached 98%.If we do more on this work it has the opportunity to get a new system that can make up for the current Vgas deficiency.
Keywords/Search Tags:viral gene finding, functional annotation, novel genes, joint application of multiple programs, deep learning
PDF Full Text Request
Related items