Font Size: a A A

Individualized G-Quadruplex Detection Method Based On High-Throughput Sequencing

Posted on:2022-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:W L LiuFull Text:PDF
GTID:2480306740479674Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
G-quadruplex(G4)is a non-classical nucleic acid secondary structure,which is folded by guanine-rich sequences.There have reported a variety of methods to characterize and locate the structure,including methods based on biophysics,biochemistry,computational algorithms and next-generation sequencing.Based on the fact that the existence of G4 s can stall DNA polymerase that will lead to the drop of Phred scores during the synthesis of DNA sequence,our research group established a practicable procedure to mine G4 s from the next generation sequencing data.But this procedure used all covered reads to characterize the quality,while ignored the potential G-quadruplex sequences' integrity.Based on the consideration of the integrity of G-quadruplex,this project screened sequencing reads and depth,established a procedure to detect canonical G-quadruplexs in the human genome,and explored the effect of single nucleotide mutations on G-quadruplexes formation to detect individualized Gquadruplexes.The correlation analysis between the individualized G-quadruplexes and transcription was also carried out.This project includes the following sections:1.Construction of the genome-wide canonical G-quadruplex detection procedureFirstly,we used the g4 predict software to predict the potential quadruplex sequences(PQ)based on hg19 reference genome and 356,298 PQs were obtained.Secondly,under the standard that sequencing reads should include the entire PQ and the sequencing depth should reach 5 or more,the median quality score of each site was calculated based on the filtered effective reads and the low-quality regions were obtained.Finally,filtered out low-quality regions affected by other structures and determined canonical G-quadruplexes.Through this procedure,there are341,103 PQs remained,and a total of 162,119 regions were detected as low-quality regions.After filtering out 47,046 low-quality regions with other structures,115,073 PQs were detected as canonical G-quadruplexes(detected PQ,d PQ),accounting for 39.13% of all analyzed PQs.2.The detection and analysis of individualized G-quadruplexBased on the single nucleotide mutation(SNV)information of GM12878 cell line,the genome-wide G-quadruplexes affected by SNVs were analyzed and the individualized Gquadruplexes of GM12878 were detected.First of all,based on the homozygous SNVs of GM12878,the reference genome was modified the PQs were re-predicted.A total of 356,690 PQs were predicted,with 2,264 increased PQs and 1,872 decreased PQs.Secondly,based on the canonical G-quadruplex detection procedure,1,155 individualized G-quadruplexes affected by homozygous SNVs were detected,accounting for 54.97% of all increased PQs.Finally,the effect of heterozygous SNVs on G-quadruplexes were analyzed.It was found that 1,947 of the4,381 d PQs detected in allele1 were not detected in allele2,1,117 of the 3,551 d PQs detected in allele2 were not detected in allele1.The different potential of base types to form Gquadruplex fully proved that the SNVs can affect the formation of the G-quadruplex structure.3.Correlation analysis between individualized G-quadruplex and transcriptionIn order to analyze whether the presence of G-quadruplex affects transcription,we used the above procedure to analyze K562 genome and compared it with the individualized Gquadruplex detected in GM12878 cell line.Totally,226 and 269 individualized G-quadruplexes were detected in GM12878 and K562 respectively.After analyzing the regions of individualized G-quadruplexes in the genome,34 of them were located in exon,promoters or transcription start site.Real-time fluorescent quantitative PCR and transcriptome sequencing were used to detect the transcripts of individualized G-quadruplexes in the two samples.The results showed that the transcripts contain individualized G-quadruplexes are more likely to be differentially expressed in comparing with that without individualized G-quadruplexes,over one times higher in ratio,indicating that individualized G-quadruplex plays an important role in the transcription process.Individualized G-quadruplex analysis of different samples is of great value and significance for exploring the effect of G-quadruplex in transcriptional regulation for individual genomes.
Keywords/Search Tags:individualized G-quadruplex, next-generation sequencing, SNV, transcription
PDF Full Text Request
Related items