Font Size: a A A

Research And Development Of The Simulated Platform For Detecting Genome Variation Based On Split Mapping

Posted on:2014-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhangFull Text:PDF
GTID:2250330422950592Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the High-thoughput Sequencing(HTS) technologies,the sequencing speed of whole genome has a significant boost with the lower cost.The demand of dealing with the huge scale genome data using bioinformaticsmethod is increasing, and detecting the genome variation becomes the core issue inthe bioinformatics field.Genome variation determins the difference of different individuals and has aclose relation with the disease. Genome variation includes insertion, deletion,inversion, duplication, the detecting variation tools not only report the variationevent but also give out the position where a variation happens. Current widely-usedsoftwares use the pair end data as input and rely on the split mapping result.In the paper, we introduce the high-thoughput data and the1000GenomesProject to tell the sources of the huge scale data and tell the importance of thesimulated environment. This paper gives a explain of the experimental environmentand results of the widely-used genome variation detecting tools Pindel, Delly,Svseq2, PRISM.In this system, we draw a detailed flow chart of building the simulated platformand explain the algorithms in the process. By implanting the genome variations intothe reference sequence and tracing the item of all variations, the system producesthe simulated reads and alignment result as well as use the reads and traced items tocalculate the answer set. By using the different detecting tools with differentsequence coverage, the system produces many sets of result. Through comparingthem with the answer set that was calculated before, it gives out an evaluation reportand analysis report. In the mean time, the system will screen out the reads that onlyexist in the answer set or result set. The researchers can divide the reads intodifferent clusters. By analyzing these clusters, they may propose some new methodsand algorithms in detecting genome variation.
Keywords/Search Tags:high-thoughput sequencing, simulated data, detecting genome variation, bioinformatics
PDF Full Text Request
Related items