Font Size: a A A

Precision Medicine Research Based On Deep Cascade Centroid Classifier

Posted on:2024-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:K XieFull Text:PDF
GTID:2544307160478104Subject:Biological and Information Engineering (Professional Degree)
Abstract/Summary:PDF Full Text Request
Cancer is a complex and diverse disease with wide genetic heterogeneity and is one of the leading causes of human mortality.Even patients with the same type of cancer may vary widely in their response to anti-cancer drugs.Therefore,precision medicine research has become a popular area in cancer research.In the precision medicine research of cancer,early diagnosis,prognosis and drug sensitivity prediction of cancer are among the important research topics.With the advent of the era of big data and the rapid development of biotechnology,the use of histological data to construct classifiers for corresponding prediction has become a popular solution nowadays.However,the high feature dimensionality,small sample size,and category imbalance of omics data generally make it difficult for traditional machine learning models to train effective and well-generalized classifiers on such data.Therefore,building machine learning algorithms for biomedical data remains a great challenge in the field of precision medicine.To address the above problems,this paper proposes a new classifier called Deep Cascade Centroid Classifier(Deep Centroid)by combining the features of good stability of the centroid classifier with the strong fitting ability of the deep cascade strategy.Deep Centroid is an ensemble learning method with a multi-layer cascade structure that can perform representation learning,divided into two phases: feature scanning and cascade learning.In the feature scanning stage,all features are randomly divided into multiple feature sets of varying sizes to explore potential feature sets with common functions in high-dimensional features.The cascade learning stage generates a prime classifier for training based on each feature set,and the output of all prime classifiers is used as new features to continue the training in the next layer.Deep Centroid can dynamically adjust the scale of the feature scanning phase and adaptively determine the number of stages based on the data size,which can run stably on small-scale data.To evaluate the prospect of the model in precision medicine research,Deep Centroid is applied to three applications: early diagnosis of lung cancer(cell-free DNA fragmentation pattern data),prognosis prediction of breast cancer(Gene expression data),and drug sensitivity prediction of cell lines(Gene expression data and DNA methylation data),and compared with six other mainstream classification models in this paper.Deep Centroid outperforms other machine learning methods in the independent validation of cancer early diagnosis and cancer prognosis in terms of MCC coefficient,and the functional annotation analysis of the key features mined reveals that many genes and processes that are closely associated with cancer occurrence can be enriched.Such as Regulation of innate immune response,B cell apoptotic process,and Cell-cell adhesion.In drug sensitivity prediction,Deep Centroid has the highest MCC(Matthews correlation coefficient)and more accurate prediction results on unbalanced biological data.The results of enrichment analysis correlated to some enzymatic regulatory processes and ion channels related to drug metabolisms,such as the Regulation of protein kinase activity and Monoatomic cation transport,in addition to cancer-related processes.In summary,this paper proposes a new classifier model for biomedical data,Deep Cascade Centroid Classifier.The model has stable classification performance,is less prone to overfitting,has better performance in independent validation,and experiments confirm its promising application in the field of precision medicine.The functional annotation analysis shows that the features scanned by the model are biologically meaningful,which proves that the model has biological interpretability.
Keywords/Search Tags:Precision Medicine, Machine Learning, Centroid Classifier, Feature Scanning, Cascade Learning, Gene Function Annotation
PDF Full Text Request
Related items