Font Size: a A A

The Research Of Bacterial Protein Coding Genes Identification System

Posted on:2018-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z YuanFull Text:PDF
GTID:2310330512488846Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Gene identification is the first step to analyze and extract important information of DNA sequences.Along with the development of sequencing technology,more and more sequenced data come up.The conventional experimental methods cannot satisfy the requirements to quickly and effectively obtain gene information from those massive datasets.Then,employing the computer technology to predict genes become a trend.In 2015,our group developed the prokaryotic gene identification program ZCURVE3.0,which combined the Z curve theory and the new machine learning algorithm SVM,added new characteristic variables,and further optimized the internal parameters to make the identification algorithm to find coding ORFs more quickly and accurately.After acquiring the prediction results of coding genes,we need to find out corresponding encoded proteins.We developed a visual integrated system through upgrading it by adding some relevant functions on the basis of ZCURVE3.0.The first part of this gene identification visualization system is the realization of the function of ZCUREV3.0(BAGA)by retaining all the components of the algorithm.By testing the whole genome of 50 prokaryotes and selecting the alignment threshold,the pseudo-genes were excluded,and the number of correct genes deleted was as small as possible,so that the correct number of genes remained unchanged.Compared to the total number of genes provided by the GenBank,the gene detection rate for BAGA prediction is 97.60%.The specificity is 96.74%,increasing 2% compared with the ZCURVE3.0(94.21%).BAGA's additional prediction rate is 3.34%,which declined about 3% comparing with that of ZCURVE3.0(6.08%).The reduction of the additional prediction rate indicates that the number of predicted errors is largely reduced.The second part is the realization of combining two gene prediction software ZCURVE3.0 and Prodigal(BAGA2.0).By keeping the same prediction genes,aligning the sequence of the different genes,and the appropriate parameters are adjusted,retain some of the predicted genes,these two parts of the gene as a final joint prediction of the genes.For BAGA2.0,the accuracy of prediction is 98.73% and the specificity is 96.09%.BAGA2.0 have better accuracy and specificity of prediction than that of ZCURVE3.0.On the other hand,the additional prediction rate of the BAGA2.0 is 4.08% lower than that of ZCURVE 3.0(6.08%).In conclusion,comparing with BAGA,the BAGA2.0 have lower specificity,better prediction rate and recognition rate,then higher accuracy.In this paper,we also realized additional functional annotation for the predicted genes of the two parts of the results.The genes were annotated with functions fast and comprehensively using multi methods.We also integrated the genome island prediction program into this system.Multi free versions of this systems were compiled for the convenience of the users: http://cefg.cn/zcurve-visualization/.
Keywords/Search Tags:gene prediction, ZCURVE3.0, BAGA2.0, accuracy rate, additional prediction rate
PDF Full Text Request
Related items