Font Size: a A A

Deep Unlabeled Data-Driven Classification Models And Their Applications On Tumor Recognition

Posted on:2020-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:W M WuFull Text:PDF
GTID:2404330575497824Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Tumor recognition is one of the most important problems in image processing and pattern recognition,which can assist medical diagnosis.Tumor recognition is mainly based on two types of data: gene and image.Traditional pattern recognition methods are mostly based on labeled training samples.Tumor recognition,however,is a typical small sample problem.It is very time-consuming and laborious to mark the lesion areas relying on experienced physician experts.Therefore,tumor recognition can be improved by mining effective information from a large number of unlabeled samples.Non-negative matrix factorization(NMF)is an unsupervised feature representation learning method.NMF does not rely on category information,and can explore useful information contained in all available samples simultaneously,even if there are only a small number of training samples.NMF has received more and more attention in the field of tumor recognition.However,there remains some problems:(1)NMF model is a typical underdetermined equation,and the solution is not unique,(2)NMF is affected by the initial value of the iteration,(3)Some useful information is hidden within the extracted features.For tackling these problems,three deep unmarked data-driven models are constructed by complementing the advantages of NMF and deep learning,and the characteristics of tumor data are integrated into the models as constraints(priori information).The models have good generalization ability and stability.And the models are optimized and analyzed.The main contributions are as follows.(1)A layer-wise pre-training multi-layer low-rank NMF(LPML-LRNMF)model is presented and applied in image-based tumor recognition.LPML-LRNMF is fulfilled by a modified NMF model,which is motivated by hierarchical and layer-wise pre-training strategies in deep learning.Low-rank constraint is integrated into the feature representation learning model by considering the intrinsic characteristics of image.The hierarchical strategy enhances the representation learning ability of NMF by exploring the essential information contained in existing available samples.The layer-wise pre-training strategy enhances the stability of NMF by alleviating its sensitivity to iteration initials.Moreover,the proposed LPML-LRNMF model is optimized via ADMM and the corresponding convergence is analyzed.Finally,a tumor recognition method based on LPML-LRNMF model is proposed.Experiments on the public dataset(MIAS)and actual clinical dataset show that the classification accuracy,specificity and sensitivity achieve the clinical acceptance level.(2)A layer-wise pre-training multi-layer sparse NMF(LPML-SNMF)-based feature learning method is proposed and applied in gene-based tumor recognition.The advantage of LPML-SNMF method is that it combines complementary strengths from NMF and deep learning.Sparsity constraint is integrated into the feature representation learning model by considering the intrinsic characteristics of gene data.It's worth noting that microarray gene expression data have the characteristic of small samples.Compared to existing gene selection methods,the DIF-based technique is established for the first time by incorporating clinical misdiagnosis rate into gene selection.Finally,the LPML-SNMF model is applied in tumor recognition.Extensive experiments on five public microarray gene expression datasets show the LPML-SNMF-based tumor recognition framework is superior to other methods.(3)A deep unlabeled data-driven classification(DUDC)model is constructed and applied in gene-based tumor recognition.The DUDC model combines feature representation learning and classification into a model,and optimizes the classification results by iterative optimization.The feature representation learning part still takes NMF as an example.DUDC model with good generalization ability and stability,even if the simplest linear regression classification is used,especially for small training samples and unbalanced classification problem.Moreover,the proposed DUDC model is optimized via a generalized ADMM and the corresponding convergence is analyzed.Finally,the performance of the DUDC model is discussed by its applications in tumor recognition.Extensive experiments are conducted on five public microarray gene expression datasets.Compared with the published state-of-the-art methods and results,there are significant improvements in classification accuracy,specificity and sensitivity.
Keywords/Search Tags:Unlabeled data-driven, non-negative matrix factorization, deep representation learning, optimization, tumor recognition
PDF Full Text Request
Related items