Detection Of Tumor Purity,ploidy And Copy Number Variation Based On Next-generation Sequencing Data

Posted on:2022-10-04

Degree:Master

Type:Thesis

Country:China

Candidate:J H Xia

Full Text:PDF

GTID:2504306602966869

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

The detection of pathological characteristics such as tumor purity,ploidy and copy number variation plays an important role in finding pathogenic genes and related treatment methods.With the advancement of experimental technology,next-generation sequencing technology(NGS)has been widely used in cancer genomics research due to its advantages of high throughput,high resolution and low cost.The sequencing fragments obtained by next-generation sequencing technology are relatively short,so the amount of data is very large.In order to effectively analyze this type of data,the following three problems need to be solved:(1)How to extract the characteristics of the sequencing data.(2)What method is used to analyze the sequencing data.(3)There is a strong correlation between the purity of the tumor,the aneuploidy of the genome,and the copy number variation.How to properly quantify the relationship between them.The existing related algorithms cannot achieve satisfactory results when the sequencing data coverage and tumor purity are low.At the same time,many copy number mutation detection algorithms mainly rely on the read depth signal of the window for abnormal analysis,and do not effectively introduce other information,which will not be able to fully capture the characteristics of the window.Based on previous studies,this paper designs two algorithms to solve the above problems in combination with the characteristics of next-generation sequencing data.The main research contents and results are as follows:1.This paper proposes a method CNV＿LGB to detect copy number variation from shortread sequencing data.It uses a method of extracting window features and introduces the machine learning model lightGBM to classify abnormal windows.Specifically,CNV＿LGB is a method based on the read-depth strategy.Firstly,CNV＿LGB performs a general preprocessing on the sequencing data.Secondly,CNV＿LGB extracts multiple features for each window,and uses the existing detection model to obtain some of the more reliable regions of variation and normal regions,and then adds these regions as labels to the data set.Finally,the supervised machine learning model lightGBM is used to classify the abnormal window,and the abnormal window is used to determine the copy number variation area.The advantages of CNV＿LGB are mainly in the following two aspects: 1)Transforming an unsupervised anomaly detection method into a supervised imbalanced classification method helps to overcome the influence of abnormal data on the results of the algorithm.2)Extracting multiple features from the sequencing data can capture the characteristics of the window from multiple dimensions,thereby ensuring the accuracy of the classification results.2.A detection algorithm Turp Aplo for tumor purity and average ploidy is proposed.Specifically,this method first locates the copy number deletion mutation area,and then determines the specific deletion type by comparing the difference in the read depth signal.Finally,the expaected reding depth,the observed reading depth,tumor purity and average ploidy are correlated,and the purity and average ploidy of the tumor samples are iteratively calculated using the characteristics of loss of heterozygosity.The advantage of this algorithm is that there are only two types of copy number deletion mutation regions: homogeneity loss and heterogeneity loss,so a more concise model can be established,thereby speeding up the detection efficiency.After verification by comparative experiments,the algorithm performs well on both simulated data and real data.

Keywords/Search Tags:

Next-generation sequencing technology, copy number variation, tumor purity, average ploidy, machine learning

PDF Full Text Request

Related items

1	The Development Of Methods For Inferring Purity,copy Number Of Genes And Alleles From Tumor Tissues
2	Research On Detection Of Copy Number Variation Based On Next Generation Sequencing Technology
3	Calling Genomic Copy Number Variation Based On Deep Learning
4	Research On Cancer Copy Number Variation Detection Methods For Next-Generation Sequencing Data
5	Machine Learning-based Impact Prediction Tool For Copy-number Variation.
6	Study On Detection Algorithms For Tumor Genomic Copy Number Alterations Based On Next-Generation Sequencing
7	Accurate Inference Of Tumor Purity And Absolute Copy Number From High-throughput Sequencing Data
8	Clinical Application Of Non-invasive Prenatal Testing Technology For Fetal Chromosome Copy Number Variation Detection
9	Application Of Next-generation Sequencing Te Chnology In Chromosomal Abnormalities Diag Nosis Of Early Pregnancy Abortion
10	The Performance Of Whole Genome Amplification Methods And Clinical Translation Of Embryo Copy Number Variation Sequencing To Pre-implantation Genetic Diagnosis