Font Size: a A A

Development Of Comprehensive Bioinformatic System For General RGEN Target Selection And Database Construction For Genome-wide Optimal Loci

Posted on:2018-02-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q TongFull Text:PDF
GTID:1360330515950965Subject:Developmental Biology
Abstract/Summary:PDF Full Text Request
Site specific genome targeting has become a tendency in various researches of wide range organisms due to the development of engineered endonucleases.In recent years,Streptococcus pyogenes Cas9(SpCas9)turned into the most popular RNA-guided DNA endonuclease(RGEN)in targeted genome modification for the characteristics of high efficiency,multiple functions and convenient operation.However,in spite of the on-target activities of CRISPR/Cas systems has been widely acknowledged and utilized,the high probability of off-target mutation is still the major problem impacting the application range and effect.Fortunately,certain regularities have been found in the off-target effects analysis.Previous studies showed that Cas9 system could tolerate as much as 10 bases mismatches of single guide RNA(sgRNA)sequence,and the protospacer-adjacent motif(PAM)proximal region was crucial for Cas9 binding specificity.Therefore,the safety of genomic editing is a key consideration no matter which prevalent technique is adopted.At present,GDIDE-seq is the most accurate mutation identification method for RGEN induced off-target cleavage in whole genome wide.Certain targets verified with very few off-target effects by GUIDE-seq were found in Cas9 and Cpf1 genome editing studies.And there was an evident phenomenon that these target sites were predicted to have lower quantity of off-targets with Cas-OFFinder,which indicated the in silico analysis of unwanted off-targets is a crucial step to assess the target loci specificity and safety.Several RGEN tools were developed to design target sequences and predict off-target sites for CRISPR/Cas system.Most of them are presented as online website and provided certain species options which are only convenient for small quantity target loci analysis.Some offered downloadable source code,such as CasOT in Perl scripts,but are restricted in usage for biologists due to the unhandy program languages.Furthermore,the biggest limitation of current programs is the incomplete off-target simulation due to the algorithm restriction.Currently,an increasing number of researches focused on large scale genetic screening utilizing RGENs for the identification of essential genes related to cell survival,drug resistance and tumor growth.Hence,there is an urgent need for a straightforward biology software applicable to large-batch RGEN editing.In this study,we present TQPF(Tremendous Qualified Pattern Finder),an off-line Java software package,for RGEN target loci selection and one-step accurate genome-wide off-target analysis.Utilizing TQPF and high-performance computing platform,we scanned bovine complete genome and human tumor and embryo genes to generate CRISPR/Cas9 optimal target loci database,named as CRISPR_Base,which is provided in MySQL library form.Based on the statistical analysis of whole identified off-target information,a target rating system was offered as a significant reference standard of prior target site selection for bovine,human or other organism.The main works of this research were as follows.1.We developed TQPF,a standalone high throughput software for pattern finding and whole genome potential off-target screening,which is applicable for any pattern matching,especially for all RGEN derivative systems.TQPF software package are composed of three function modules: “Pattern Finder”,“OT Searcher” and “One Step Off-Target Analysis”.The algorithm structure used in TQPF is like “Water-drop and Tube” model,which is highly efficient and enables to minimize the undetected potential off-targets.The highly customized design,consumer-friendly interface,convenient and powerful analysis ability,and extensible application range enable to make this program superior than the available RGEN tools.2.For seeking the characteristics of ideal target loci and off-target occurrence regularity,we contrastively analyzed the TQPF outcomes with available GUIDE-seq data.To confirm the more adaptive parameters for genomic-wide CRISPR/Cas9 target loci selection,we used Pattern Finder to excavate N20 NGG target loci from a random sampling of genomic frames,and evaluated the latent off-target quantity of each target site with OT Searcher.This result demonstrated that the real off-target mutations occur at the low mismatch number loci with high probability.And there is an exponential increase in raw data size with the mismatch number increasing.Synthetically considering the storage capacity with retaining the sufficient information,we acquired rough data under max-mismatch 6 to assess off-target situation as a whole.3.We made an easy-going procedure to establish the “CRISPR_Base”,a compositive MySQL library based on TQPF output data,for the retrieval of optimal target sites and corresponding potential off-target summary information.Two main sub-databases are included in CRISPR_Base: BovineTargets_OffTargetsData for bovine entire genome,and HumanTargets_OffTargetsData for human tumor and embryo development related genes.CRISPR_Base has powerful features in large scale target loci storage,retrieval and upgrade.This complete technological process for whole genome target filtration can be widely used in other species.4.Whole genome preferential CRISPR/Cas9 target sites distribution characteristics have been analyzed from CRISPR_Base to provide instructions and parametric model of large scale target loci selection for other species.There was a total number of 2,293,508 bovine target loci and 727,886 human targets contained in CRISPR_Base which were screened by unique targets filtration procedure.Futhermore,a ranking system based on genome-wide off-target statistical regularities,was proposed to rapidly prioritize target loci,which could be a golden rule to help evaluate and choose the safety targets.5.In order to make the most of prioritized Cas9-targets,we searched the “TTTVN20NGG” pattern sequence which contained overlapping target sites for both Cpf1 and Cas9 via Pattern Finder program in CRISPR_Base.A total number of 36,124 bovine sites and 11,018 human sites conformed to common module which shared the off-target data of Cas9 targets,and these sites were annotated in CRISPR_Base for retrieval.Then analyzing the existence ratio,location distribution and gene coverage rate of common targets in database.These results demonstrated that a considerable quantity of common prioritized target loci for both Cpf1 and Cas9 endonucleases exist in bovine and human genome,which could be utilized in genomic editing for gene therapy,functional identification and exogenous fragment insertion.6.To facilitate the site-specific genome editing technology applied in generation of cow mammary gland bioreactor,we analyzed the sequences of four major milk protein genes—CSN1S1,CSN1S2,CSN2,BLG and several intergenic region in chromosome 24 and 28 for CRISPR/Cas9 target loci selection.Then detected the efficiency of candidate loci which located in suitable position with relatively small amount of potential off-targets.Utilizing the single-strand annealing recombination assays to test DSB and HDR efficiency of target loci.The results shown that all the candidate targets can be cleavage by Cas9 endonuclease and repaire via homology recombination.The efficiency of casein target loci are highly than BLG and intergenic targets.In general,TQPF is the first standalone high-throughput software for user-defined motif screening and analysis that could be simply operated in both Windows and Linux systems.And this is the first study of providing MySQL database with detailed prioritized target loci information for convenient retrieval,and revealing the characteristics of RGEN target loci and off-targets in whole genome wide.We prospected our research could facilitate RGEN mediated site specific editing and reduce the potential safety hazard in applications.
Keywords/Search Tags:genome editing, target screen tool, off-target analysis, RGEN targets database
PDF Full Text Request
Related items