Font Size: a A A

Research On The Application Of Machine Learning Algorithms In The Identification Of Cancer-related Signature Features

Posted on:2021-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y M DongFull Text:PDF
GTID:2504306548478734Subject:Chemical Engineering
Abstract/Summary:PDF Full Text Request
With the entering of the era of precision medicine,the deep studies of cancer mechanism have received more and more attention.The breakthrough in sequencing technology has greatly improved the ability to explore tumor-related genes,RNAs,snoRNAs and other factors.Cancer precision medicine needs the identification of driver genes,biomarkers and other signature features for further development of research and treatment.The importance of epigenetic mechanisms and human snoRNAs in the initiation and progression of cancer have been proved and providing great potential for improving clinical efficacy.DNA methylation is one of the most well-studied forms of epigenetic modification.Therefore,they have received increasing attention.At the same time,the mortality rate of lung cancer is high worldwide,especially the relationship between lung adenocarcinoma and smoking status is not clear.Therefore,in this paper the application research on classification,pattern recoganition,feature extraction and other machine learning algorithms was explored for cancer related signature selection.To do this,two case studies were included in our study:1)The identification of the tobacco exposure related signature methylation probes: tobacco exposure is one of the most important risk factors to lung cancer patients.The identification of tobacco-related signature methylation probes and the analysis of their regulatory networks at different molecular levels may be of a great help for understanding tobacco related tumorigenesis.Three independent lung adenocarcinoma datasets were used to train and validate the tobacco exposure pattern classification model.A deep-selecting method was proposed and used to identify methylation signature probes from hundreds of thousands whole epigenome probes.Then,BIMC(Biweight Midcorrelation Coefficient)algorithm,SRC(Spearman rank correlation)analysis and shortest path tracing method were explored to identify associated genes at gene regulation level and protein-protein interaction level,respectively.105 probes were identified as tobacco-related DNA methylation signatures.At gene regulation level,33 genes are uncovered to be highly related with signature probes by both BIMC and SRC methods.2)SnoRNAs(Small nucleolar RNAs)are small RNA molecules with approximately 60-300 nucleotides in sequence length.They have been proved to play important roles in cancer occurrence and progression.It is of great clinical importance to identify new snoRNAs as fast and accurately as possible.A novel algorithm,ESDA(Elastically Sparse Partial Least Squares Discriminant Analysis),was proposed to improve the speed and the performance of recognizing snoRNAs from other RNAs in human genomes.In ESDA algorithm,to optimize the extracted information,kernel features were selected from the variables extracted from both primary sequences and secondary structures.Then they were used by sparse partial least squares discriminant analysis algorithm as input variables for the final classification model training to distinguish snoRNA sequences from other Human RNAs.Furthermore,we compared ESDA with other widely used algorithms and classifiers: Sno Report,Random Forest,Distance Weighted Discrimination and support vector machine.The highest improvement of accuracy obtained by ESDA was 25.1%.These strongly proved the superiority performance of ESDA and make it promising for identifying SnoRNAs for further development of the precision medicine for cancers.
Keywords/Search Tags:Precision medicine, Machine learning, Signature identification, Lung adenocarcinoma, SnoRNAs
PDF Full Text Request
Related items