Font Size: a A A

Study On DNA 6mA Quality Control Model Based On MeDIP-seq And SMRT-seq

Posted on:2021-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:S Q YangFull Text:PDF
GTID:2480306311484264Subject:Biology
Abstract/Summary:PDF Full Text Request
DNA N6-methyldeoxyadenosine(DNA 6mA)is a DNA methylation modification widely present in eukaryotic and prokaryotic genomes.DNA 6mA methylation modification involves gene expression,DNA replication and repair and host-pathogen interactions,and is usually closely related to the restriction modification(RM)system,which protects the host from foreign genomes invasion.In recent years,with the fast development of specific antibodies and high-throughput sequencing technology,many methods have been proposed to detect DNA 6mA events in eukaryotic and prokaryotic genomes,including DNA methylation immunoprecipitation sequencing(Me DIP-seq),DNA 6mA sequencing based on restriction enzymes(RE-seq),single molecule real-time sequencing(SMRT-seq)and nanopore sequencing(ONT-seq).Me DIP-seq can detect the possible regions containing DNA 6mA sites in the genome,but cannot determine the DNA 6mA sites at the single nucleotide level.Single-molecule real-time sequencing(SMRT-seq)monitors the pulsed fluorescent signal of a single nucleotide event to enable genome-wide DNA 6mA localization at the single nucleotide level,but the DNA 6mA false positives identification need a cost-effective quality control method.In this paper,Me DIP-seq is used to help reduce the DNA 6mA false positives by SMRT-seq.Through systematic experiments on 8 species,we discussed the feature selection methods and cutoff values systematically.The main research work is as follows:1.We briefly introduced the research background and significance about DNA 6mA methylation modification,bioinformatics research progress,tissue distribution characteristics and biological functions,and discussed the advantages and disadvantages of the current DNA 6mA detection and analysis methods,which provide a powerful theoretical foundation for the subsequent research in this article.We then briefly described the data set used in this article,discussed the Me DIP-seq detection process of DNA 6mA peak,the SMRT-seq detection process of DNA 6mA sites in detail,and the reliability of detection results in this paper,which provide database and technical support for this research.2.We analyzed the correlation between the features of DNA 6mA sites and peaks.According to the principle of DNA 6mA detection,six different features information were selected.The correlation between quality control features and threshold feature was systematically analyzed by normality test and related analysis methods.The research found that the coverage,-log10(qvalue)and Fold Enrichment features are weakly correlated with IPDRatio,and the cutoff values are coverage?50,Fold Enrichment?1 and-log10(qvalue)?2,while Score and frac are highly correlated with IPDRatio,and the cutoff values are Score?30 and frac?0.7 respectively.3.We proposed a DNA 6mA quality control method based on the confidence interval named MASQC.First,we detected the potential DNA 6mA sites by SMRT-seq,and then used the peak regions by Me DIP-seq to filter the DNA 6mA sites to obtain the IPDRatio data set sample.According to the central period theorem and sample distribution,we calculated the 95% confidence interval of the DNA 6mA IPDRatio population.For the Chlamydomonas reinhardtii genome,10-fold cross-validation was used to evaluate the performance of the MASQC method.The average AUC was 0.925,the average IPDRatio threshold was 4.54 which was basically consistent with Fang et al's experimental result.In this paper,six prokaryotic genomes were selected for experiments.We compared the level changes of the conserved sequence motif before and after quality control to evaluate the quality control effect.The non-motif sites were largely filtered out,which indicates that MASQC effectively reduced DNA 6mA false positive rate.The experimental results show that the MASQC method proposed in this paper can effectively control the DNA 6mA false positive rate,without the whole genome amplification(WGA)sequencing,thereby greatly reducing the cost of sequencing.
Keywords/Search Tags:DNA 6mA, MeDIP-seq, SMRT-seq, eukaryotes, prokaryotes
PDF Full Text Request
Related items