Font Size: a A A

Studying Of The Conservative Features' Analysis And Automatic Recognition Of Acr Genes

Posted on:2021-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:D K PuFull Text:PDF
GTID:2370330623467944Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Gene editing has been playing an increasingly important role in scientific research and application in the field of life sciences.In recent years,Acr,a protein that inhibits the function of CRISPR-Cas system,has begun to enter our field of vision and become the focus of current research.Acr can be utilized to protect MGE fragments in bacterial cells and as a tool to regulate gene editing systems.Some bacteriophages successfully infect bacteria and integrate their genetic material into the host genome through Acr's ability of inhibiting the CRISPR-Cas system.Because our understanding of Acr is still very limited,scientists can only use a relatively single method to locate the approximate Acrs in the genome of bacteria and then verify them through experiments,which are time-consuming and laborious.Therefore,the systematic analysis of Acr-related features and design of a complete Acr recognition system will greatly promote our understanding and identification of Acr.Therefore,we systematically investigated the characteristics of Acrs,combined with the method of machine learning,we constructed an accurate Acr recognition system by using a decision tree.In this work,we analyzed the characteristics of Acr from five perspectives: 1)compared with non-acr proteins,Acrs have shorter sequence length and are distributed in the range of 81~234aa;2)in Genbank,Acrs are usually annotated as hypothetical protein,while non-Acrs always has specific functions;3)plenty of the Acrs' encoding genes(66.7%)are located on the genome islands,most of which(81.8%)belong to prophages;4)different from non-Acrs,HTH domain usually exists not far from the downstream of Acrs;5)there is significant difference in codon usage bias among non-Acrs' coding genes and Acrs'.In order to collect sufficient data to construct the classification model of decision tree,we obtained 2655 Acr homologous(distributed in 1413 genomes)as the data set through BLAST and strict conditional screening.Through grid filtering and cross validation,we obtain the best training parameters and construct the classification model of decision tree.In cross validation,Our model obtained an AUC value of 0.91.For positive samples,the accuracy rate is79%,the recall rate is 81%,and the f1-score is 0.8 In the 5 independent set tests,the average precision rate reached 64.6%,the recall rate is 90%,the f1-score is 0.75,and the numbers of Acrs in the predicted results are all less than 10,which greatly reduced the cost of subsequent experimental verification to a certain extent.Finally,to make our model available to more people,we wrote a complete Acr prediction process,and build an online service website AcrDetector(http://cefg.uestc.cn/acrDetector),at the same time,we can also offer a convenient use of local version(https://github.com/pudongkai/acrDetector.git).
Keywords/Search Tags:anti-CRISPR, Acr, Acr prediction, decision tree classification model
PDF Full Text Request
Related items