Font Size: a A A

Methods And Analysis On Cancer Genome And Transcriptome Sequencing Data

Posted on:2012-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q LiFull Text:PDF
GTID:2284330467489021Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Structural variation is an important source of human genetic diversities. It includes insertion, deletion, inversion, translocation and copy number variation. Detection and characterization of these structural variations are important for understanding the genetic change among species, such as their evolutionary history, and the complex diseases in human populations, especially cancer. With the advances in the next generation sequencing, the paired-end library enables more sensitive and accurate detecting method, such as those based read depth and paired-end mapping clusters. Because human genome contains many repeat regions and experimental errors can influence the sequencing data quality, many methods can not accurately locate the deletion breakpoints and lengths. Based on the mate-pair insert size distribution, we developed a method to detect deletion using the R language. Comparing our method with BreakDancer and PEMer on simulated data, we found that our method has slightly better performance in both sensitivity and specificity. We applied our method on a case of liver cancer genome data, discovered60cancer specific deletions, and some were validated via PCR. Importantly, several key tumor suppressor genes, such as APC and MCC, were embedded in the5q deletion region, also the two breakpoints of one deletion located at C5orf51and CPEB4, made C5orf51-CPEB4fusion gene.Relative to cancer genome, transcriptome can qualify a time and space specific physiological state, and can analyze cancer from many aspects, such as the somatic mutations, allele specific expression, fusion genes, gene difference expression, alternative splicing, pathway and etc. RNA-Sequencing is becoming a revolutionary tool for studying transcriptome, it focuses on changes in code regions. We used RNA-Sequencing to analyze9pairs of lung cancer; each sample contains tumor tissue and surrounding normal tissue. Then we found some specific somatic mutations, differential expression genes, fusion genes and etc. Comparing our gene list to the public database COSMIC, we found several common and specific variations in lung cancer, such as TP53, EGFR, SDF4emerging3samples on same locus, non-small lung cancer pathway change and etc.
Keywords/Search Tags:insert size, detecting structural variations methods, RNA-Sequencing, lungcancer
PDF Full Text Request
Related items