Font Size: a A A

The Development Of Methods For Inferring Purity,copy Number Of Genes And Alleles From Tumor Tissues

Posted on:2023-07-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:X P FanFull Text:PDF
GTID:1524306809473634Subject:Drug design
Abstract/Summary:PDF Full Text Request
Tumor purity is the proportion of cancer cells in a tumor sample.In studies using next-generation sequencing to analyze the molecular and genomic features of tumor tissue,tumor purity has a substantial impact on the results and may alter the biological and clinical interpretation of results,which leads to the false-positive findings and falsenegative findings.Accurate inference of tumor purity of tumor samples is a prerequisite step for inferring gene copy number and allele-specific copy number of the tumor genome.Genomic alterations discovered in large-scale cancer genomic projects,have therapeutic implications in being an important source of drug development.Copy number alterations,due to its large impact on the genome,have been an important contributing factor to oncogenesis and metastasis.Tumor phylogeny constructed from gene copy number and allele-specific copy number profiles of different cancer stages have become an important method for studying metastasis,drug resistance and biomarker discovery.Although many algorithms exist to infer tumor purity,gene copy number and allele-specific copy number,most of them require some expertise and are suitable for sequencing data under conditions of high sequencing depth and high tumor purity.Accurately inferring tumor purity,gene copy number and allele-specific copy number from sequencing data under conditions of low sequencing depth and low tumor purity remains a challenging task.Therefore,we introduce Accurity and Accucopy.Accurity is a self-designed,sophisticated statistical model that accurately infers tumor purity from sequencing data at ultra-low sequencing depth and ultra-low tumor purity conditions.In the Accurity’s design,we found the periodic pattern in deep sequencing data and applied it to the model.In order to solve the periodicity,we used methods from the time series analysis filed to design a two-stage optimization search method,which reduced the feasible solution space of the model and improved the computational efficiency and accuracy.We also denoised the data using kernel smoothing methods from the signal processing filed,and used the Bayesian Information Criterion to deal with model overfitting.Those innovative improvements improved the signal-to-noise ratio of sequencing data and optimized the parameter solution method,which solved the computational failure caused by the low signal-to-noise ratio of data under the condition of low sequencing depth and low tumor purity.During testing,we designed a pipeline to simulate the sequencing data of tumor tissue based on the sequencing data simulation method EAGLE developed by Illumina.The simulations with known results generated by this pipeline provided great help for the design and testing of the algorithm.We also compared Accurity with state-of-the-art algorithms on simulations and TCGA datasets to highlight the accuracy and superiority of Accurity.Accucopy is a self-designed,sophisticated statistical model.It can accurately infer the copy number of genes and alleles in tumor genome from sequencing data under conditions of ultra-low sequencing depth and ultra-low tumor purity.In the design of Accucopy,the periodicity in deep sequencing data was used to directly solve the total copy number of genomic segments,which avoids the excessive model complexity caused by parameterizing the total copy number of genomic segments.We found that SNVs data would be biased under the conditions of low sequencing depth and low tumor purity,which explained the reason why those peer algorithms could not accurately infer the allele-specific copy number under these conditions,and the correction of the SNVs data improved the accuracy of Accucopy.In the solver of allelespecific copy number,constrains were added to the EM algorithm based on the model characteristics to reduce the feasible solution space and improve the solving speed and accuracy.We also used the results of 1000 Genomes Project as priors to make the inference of SNVs more robust.During testing,in addition to the simulations and TCGA datasets,the HCC1187 tumor cell line dataset was additionally introduced as a real dataset with partial known results.It provided a great help for the design and testing of Accucopy.We also compared Accucopy with state-of-the-art algorithms on these datasets to highlight the accuracy and superiority of Accucopy.The main strength of Accurity and Accucopy is the ability to accurate infer tumor purity,gene copy number and allele-specific copy number from sequencing data under conditions of low sequencing depth and low tumor purity.Through comparative analyses with peer algorithms in both simulated and real-sequencing datasets,Accurity shows higher accuracy and robustness than ABSOLUTE and Sequenza.Accucopy shows higher accuracy and robustness than ABSOLUTE,Sequenza and Sclust,and more suitable to pan-cancer projects than Sclust.In order to guarantee the low runtime and the efficiency of memory usage,the core program was implemented in C++ and Rust.In order to guarantee the ease of use and portability of the program and reduce the difficulty to use,the program was embedded in the popular lightweight virtual machine Docker.In theory,Dockerized App can run on any Unix-based and Linuxbased operating system.For HPC users,Docker images can also be easily converted to Singularity images.Accurity and Accucopy are easy-to-use and well-documented programs that non-experts in computational biology or bioinformatics can use with ease.Since their release,Accurity and Accucopy have been downloaded about 2000 times in total.The registration information shows that users include the Chinese National Human Genome Center at Shanghai,University of Chinese Academy of Sciences,Peking University,Harvard University,National Institutes of Health(NIH),Massachusetts Institute of Technology and other research institutes.We hope that Accurity and Accucopy can be used in clinical oncology project for the correction of analytical results and the discovery of tumor biomarkers.
Keywords/Search Tags:Cancer, Next-generation sequencing, Tumor purity, Copy number
PDF Full Text Request
Related items