Font Size: a A A

Algorithm For Virus Insertion Detection Using Whole Genome Sequencing Data

Posted on:2017-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:C HouFull Text:PDF
GTID:2284330485969190Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Virus is an important factor in disease. Our country has a big number of hepatitis B virus (HBV) carriers, making our country a high rate of liver cancer. With the development of next generation sequencing (NGS), we are able to study how the HBV influences human genome and the pathogenic mechanism of HBV integration.This paper tries to build a new method and software to find the virus integration site via methods like mapping NGS reads to human genome, filter and remap reads to virus and so on.This method uses human whole genome sequencing (WGS) data to detect the insert events of virus:1. Soft clip reads are mapped for several times to detect integration sites.2. Paired reads from different reference genomes are located to human and virus separately, and give out indistinct integration sites.3. These sites are evaluated by the MAPQ value from each mapping data.4. A one-sided binomial test is carried out on the coverage of properly mapped reads to evaluate the probability of the integration site resulting from soft clip reads.5. For sites that do not pass the test, the reads around then are de novo assembly and a blast was carried on them to study the reason and distribution of questionable sites.WGS data form 24 pairs of cancer and normal samples were analyzed to test our software. We found that the TERT gene on chromosome 5 was inserted by HBV in 11 cancer samples whereas none was found in normal samples. TERT is highly related to the functions of telomerase.
Keywords/Search Tags:Virus integration, NGS, whole genome sequence, de novo assembly
PDF Full Text Request
Related items