Font Size: a A A

Systematic Identification,annotation,and Analysis Of CRISPR-Cas Systems And Related Elements In Bacteria And Archaea

Posted on:2021-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z J TangFull Text:PDF
GTID:2370330623967945Subject:Biophysics
Abstract/Summary:PDF Full Text Request
CRISPR–Cas systems consist of clustered regularly interspaced short palindromic repeats(CRISPR)and associated proteins(Cas),which are antiphage immune systems extensively presenting in numerous bacterial and most archaeal species.In recent years,more and more CRISPR-Cas systems have been developed as reliable and powerful genome editing tools based on the characteristics that CRISPR-Cas system can specifically identify and cut genomes.However,these gene-editing tools based on the CRISPR-Cas system still do not meet the needs of researchers.For now,finding similar or better systems in bacteria or archaea is still crucial.The primary task of exploring different systems is to systematically identify and annotate the putative Cas proteins,and to construct an information center of candidate CRISPR-Cas system in bacteria and archaeans by combining the identification and annotation of other components in the system,such as Cas protein cluster,tracrRNA sequence,CRISPR sequence and PAM sequence.In this study,firstly,we identified putative Cas proteins and Cas protein clusters in all bacteria and archaea.Secondly,we identified CRISPR sequence in 1838 bacteria and archaea.If the Cas protein clusters neighbor the CRISPR sequence in genome we assumed this cluster a “Cas operon”.Further,we classify the CRISPR-Cas system type of 1162 Cas operon of bacteria and archaea,of which 276 Cas operon of bacteria and archaea belonging type II CRISPR-Cas system,1024 Cas operon of bacteria and archaea belonging type I CRISPR-Cas system,50 Cas operon of bacteria and archaea belonging type III CRISPR-Cas system.Finally,the potential tracrRNA sequences and PAM sequences of type II CRISPR-Cas system were predicted.The tracrRNA1 sequence was predicted in 575 bacteria and archaea,and the tracrRNA2 sequence was predicted in 175 bacteria and archaea.The prediction model of tracrRNA was constructed based on SVM algorithm by using the tracrRNA set which was identified in the experiment,with an accuracy of more than 85%(But the prediction model is only an attempt,this part of the result does not count towards the final result).The spacer sequence in 481 bacteria and archaea was blast with the viral genome,among these we selected 98 bacteria and archaea with a large number of blast result in the viral genome to annotate the PAM sequence.Here,we constructed Cas Protein Data Bank(CasPDB),an integrated and annotated online database for Cas proteins across bacteria and archaea.The CasPDB database contains 287 reviewed Cas proteins,257?745 putative Cas proteins and 3593 Cas operons from 32?023 bacteria species and 1802 archaea species.The database can be freely browsed and searched.The CasPDB web interface also represents all the 3593 putative Cas operons and its components.Among these operons,328 are members of the type II CRISPR-Cas system.CasPDB supports multiple retrieval methods,downloads and visualization of 3593 putative Cas protein operons and their components.The construction of CasPDB can provide researchers with data to further explore the defense mechanism of CRISPR-Cas system in bacteria and archaea,and provide candidate "scissors" for gene editing.
Keywords/Search Tags:Bacterial archaea, CRISPR-Cas system, Cas proteins, CRISPR sequence, CasPDB database
PDF Full Text Request
Related items