Font Size: a A A

Cancer Classification Based On Transcriptome Fluorescence Sequencing Data Integration Strategy

Posted on:2022-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y NingFull Text:PDF
GTID:1484306317989409Subject:Measuring and Testing Technology and Instruments
Abstract/Summary:PDF Full Text Request
Laser-induced fluorescence sequencing technology is the most widely used biological data detection.It has been 25 years since the invention of the first generation of laser-induced fluorescence sequencing technology,and now megapixel sensors can track the laser-induced fluorescence in parallel for sequencing.Laser-induced fluorescence sequencing technology has promoted complex disease research across the multi-omics data.Cancer biomarker identification is an important application of transcriptome sequencing technology.Those biomarkers can assist in clinical diagnosis and personalized treatment for cancer.The reproducibility of single molecular biomarkers is a major challenge for heterogeneous diseases.Moreover,it is hard to explain biological mechanism from function and system level.With the development of the pathway database,we have chance to identify integrated biomarkers for classifying.Thus the classification methods based on pathway gene function enrichment and topological network were proposed.The former has clear statistical significance by identifying classification features through statistical analysis of specific gene sets,while it has poor biological interpretation because it only examines the number of genes in the gene set,and does not pay attention to the specific measurement of genes.The latter identifies classification features in a larger biological network and use machine learning and statistical analysis to detect classification biomarkers.Those biomarkers have better biological explanation.Most of these methods based on topological network use undirected network,and do not consider the importance of dysregulated subpathway,and has less consideration of integration of multi-molecular data.Classification methods has focused on the development of complex algorithms and has neglected the construction of robust features.For solving above problems,this paper proposes a method workflow based on machine learning and mi RNA-mediated subpathway,which are called mi RNA-mediated directed random(mi DRW),and the survival analysis based on mi RNA-mediated directed random(mi DRWSuv).Moreover,we further study the classification biomarkers,which are identified by classification methods.This paper mainly elaborates from four aspects,1.We constructed a global directed pathway network(GDPN)by merging compounds and enzymes based on regulation information of the KEGG pathway database,and added a virtual gene node to solve hitch point of the GDPN.We proposed the weighted directed random walk and set parameter values with biological significance.Results showed the topological weights of the hub genes in the GDPN were increased by the weighted directed random walk.2.We defined mi RNA-mi RNA-mediated subpathway and constructed the mathematical model to represent the meaning of mi RNA-mediated subpathway activity.On the one hand,the model was restricted with biological regulation.On the other hand,we calculated the activity value of the mi RNA-mediated subpathway by integrating the differential level of mi RNA and m RNA,m RNA expression,the target relationship of the mi RNA and m RNA,and the topological information of the GDPN.Results showed mi RNA-mediated subpathway had ability to classify the pattern of samples.3.We proposed a method workflow,which involved multiple machine learning methods.This workflow used statistical analysis to detect candidate features,and identified cancer-specific dyderegulation subpathways by using greedy algorithm.Results showed the workflow was independent of the machine learning model,which could identify robust classification features by using classic machine learning methods.4.We proposed adaptive weighted Lasso-Cox model,and used the model to identify survival biomarkers.The model learned the relationships between the activity value of the mi RNA-mediated subpathway and survival time.The adaptive parameters were set to identify survival biomarkers rapidly.Results showed the method could identify survival biomarkers of breast cancer.Those survival biomarkers could significantly stratify the high risk group and low risk group.Large-scale experiments had proved that mi RNA-mediated subpathway had strong robustness and good biological interpretation.The classification method based on mi RNA-mediated subpathway had good generalization ability and stable classification ability.
Keywords/Search Tags:fluorescence sequencing, random walk, adaptive Lasso estimation, topological structure, transcriptome
PDF Full Text Request
Related items