Font Size: a A A

Research Of Cancer Progression Stage Omics Biomarkers Detection Algorithms

Posted on:2019-11-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J RenFull Text:PDF
GTID:1364330548956776Subject:computer science and Technology
Abstract/Summary:PDF Full Text Request
So far cancer is one of the killer diseases for human health.There is no completely cure of cancer in the present days all over the world.Doctors only judge the degree of cure by clinical treatment and condition of prognosis of a cancer patient.Prognosis has two evaluations,one is 5 years survival rate and the other is mortality.Cancer staging is simply degree of invasion and spread level,normally cancers can be divided into four stages in the world.Generally speaking,stage I and stage II belong to early stage,having big chance for cure,stage III and IV belongs to late stage,5 years survival rate is low.With the implement of the human genome project and the vigorous development of biological big data,people's molecular biology knowledge level towards cancer from the genomics extended to proteomics and epigenomics,which are multiple omics level.The classification algorithm is used to differentiate early and late stage samples in order to find more effective biomarkers,which can promote the gene targeting therapy of cancer.Cancer gene targeting therapy has served in the bottleneck of the radiation and chemotherapy,immunotherapy,surgery and other medical means,but is not limited to this,multiple omics data research will provide more possibilities for tumor treatment technology.A common process of statistical machine learning is first use machine learning method to study biological big data,and then optimize model,finally through the model to analyze and predict data.In this paper,we use biological big data,including genomic and proteomic and the epigenomic data.If the condition of sequencing technology allows,the multiple omics data cross analysis and research will be more specific to the understanding of the tumor,and to screen out more useful tumor biomarkers.Tumor biomarkers prediction and detection is statistical machine learning research's hot topic,its main technologies are feature selection and classification and regression techniques.This article use statistics and machine learning method of related research through three series of omics data,aims to find more effective and accurate tumor biomarkers in stages,thus providing guidance for clinical treatment,can also improve the prognosis to increase survival rate.Lung cancer is the most common cancer in the world today,according to the latest studies,lung cancer mortality rate is the top of all cancers.Lung cancer is one of the most representative cancers,this is our reason to first study the causes of lung cancer.The treatment of lung cancer in recent years has made great progress,but clinical outcomes are still not ideal,according to statistics,the recurrence rate of 15% to 30%,the 5-year survival rate of 60% to 70%.Lung adenocarcinoma(LUAD)and lung squamous cell carcinoma(LUSC)are the two major subtypes of lung cancer.Lung cancer patients will live longer and suffer less if diagnosed at the early stages and treated properly.This study conducted a comprehensive screening of protein biomarkers for the prediction of lung cancer progression stages.The proteomic data were firstly pre-processed as(0,1)-normalization.The feature selection algorithm SVM-RFE was utilized to screen candidate biomarkers.The redundant features were removed by a backward k-feature selection procedure.The three-class discrimination model of the 28 protein biomarkers achieved 86.51% in accuracy for LUAD.The three-class discrimination model of the 41 protein biomarkers achieved 89.47% in accuracy for LUSC.There are 5 protein biomarkers appearing in both LUAD and LUSC models.Transcriptomic models used 34 and 38 transcript biomarkers to achieve 99.20% and 100% in the three-class accuracies for LUAD and LUSC,respectively.Epigenomic models used 43 and 36 epigenetic biomarkers to achieve 93% and 89% in the three-class accuracies for LUAD and LUSC,respectively.The above two comparative analysis indicate that the proposed biomarkers provide discerning power for accurate stage prediction,and will be improved when larger-scale proteomic quantitative technologies become available.Kidney cancer is one of the most common malignant tumor in urinary system recent years,and clear cell carcinoma of kidney(KIRC)is the most common kidney cancer,the incidence of kidney cancer and fatality rate are keeping a rising trend year by year,recent research has shown that targeted therapy on kidney cancer patient can extended survival time to 50 months.Methylation process is reversible,so it is the switch of gene expression,DNA methylation sites can be used as a target for targeted therapy of tumor prevention and treatment,the study of methylation set of data is necessary.Study of kidney cancer staging biomarkers can help doctor gives some specific treatment plan for different patients,and can get statistically significant methylation redidues,it is of great importance to clinical and targeted medication guide.Application research of machine learning algorithms above is not ideal in this methylation data sets,just reached the highest classification accuracy of 76.46%,through the analysis of the specificity for kidney cancer,this article puts forward a kidney cancer staging studies based on gender differences.This article comprehensively compare seven feature selection algorithms like T-test and seven classification algorithms such as SVM,and the comprehensive data set is adopted by the gender specific methylation data sets(male and female data),and the both-gender methylation data set.Results show that a simply divide data set into gender specific data set,and use the incremental feature selection strategy based on T test,will effectively improve the classification performance of support vector machine(SVM).In addition to the support vector machine(SVM),the incremental feature selection strategy based on T test is effectively improved the other six classifier proposed.Six feature selection algorithm's assessment data also support the gender specificity.Research data show that using Tri Vote algorithm train model of the logistic regression in this article use the methylation data set reach the best performance.Finally apply one gender training data model on the other gender data,cross validation results show that the data model is independent between gender and are not relevant to each other.
Keywords/Search Tags:biomarker, protein biomarker, epigenetic biomarker, feature selection, biomarker detection
PDF Full Text Request
Related items