Application Of Multi-objective Fuzzy Clustering Method In Cancer Classification

Posted on:2021-01-27

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Lin

Full Text:PDF

GTID:2514306302454164

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Cancer classification prediction is an extremely important work in the field of biomedical research,which provides an important basis for the diagnosis and treatment of cancer.With the development and improvement of high-throughput technology,a large amount of gene expression profile data is used in the classification research of cancer.Gene expression profile data is a collection of gene expression levels,and modern molecular biology shows that the occurrence and development of cancer are closely related to gene mutations and the loss of tumor suppressor gene function.Therefore,gene expression profile data has also become the main data source for cancer classification research.From a genetic point of view,scientifically and effectively selecting a small number of carcinogenic related genes not only helps in the classification and prediction of cancer,but also further research on oncogenes also contributes to the development of related drugs,which is the popular "precision medicine" in recent years."And"targeted therapy".Aiming at the characteristics of high dimension and small sample of gene expression profile data.This paper proposes the FCM-SVM-RFE model based on SVM-RFE.First,pre-process the gene expression profile,and then divide the genes into gene modules by multi-objective fuzzy clustering;in multi-objective fuzzy clustering,in addition to the commonly used clustering validity indicators FPC and PBM,this article also uses Using BHI as the objective function makes the clustering results more biologically similar;in addition,using NSGA-Ⅱ in multi-objective optimization solution,the Pareto optimal solution can be obtained more quickly;then,on the basis of clustering A gene module uses SVM-RFE for feature selection to remove redundant genes more effectively;then,based on the idea of bagging,multiple different classification algorithm model training is used to obtain multiple classification prediction models with large differences,thus integrating multiple The advantages of the model algorithm to solve the problem of poor stability of the single classification model;Finally,the test samples are tested using the multi-algorithm multi-model and voting decision mechanism obtained by training to obtain the final cancer classification results to reduce the high-dimensional gene expression profile data Classification errors caused by small sample problems.This paper conducted an experimental study on the four different cancer gene expression profiles in the GEO database,including acute myeloid leukemia,breast cancer,colon cancer,and Taiwan non-smoking lung cancer data sets.The experimental results are that the FCM-SVM-RFE used in this article is better than SVM-The classification accuracy obtained by RFE is higher than 5%.The classification accuracy of the integrated model of classification prediction in this paper is higher than that of each single classification learning algorithm.The experimental results show that when dividing gene modules,multi-objective fuzzy clustering is used,and the results of the division can have biological significance;in the construction of feature extraction methods,considering the co-expression of genes,through the FCM-SVM-RFE method,Deleting features in each gene module can better solve the redundancy of gene feature selection;in the sampling process,the training set is diversity sampled to overcome the problem of overfitting caused by model training,so this article classifies The generalization ability of the prediction model has been enhanced;in the construction of the classification integrated model,multiple classification algorithms are integrated to complement each other’s advantages and solve the problem of the limitation of a single classification algorithm.

Keywords/Search Tags:

gene expression profile, High dimensional and small sample size, feature selection, NSGA-Ⅱ

PDF Full Text Request

Related items

1	Prediction Of Local Recurrence Of Head And Neck Cancer Unimodality Based On Small Sample And High-dimensional Gene Expression Data
2	Research On Machine Learning Method Of High Dimensional Small Sample (Medical) Data
3	Exploration Of Pathogenic Loci Of Genetic Diseases And Research On A Kind Of High-dimensional Small Sample Problem
4	Research On The Algorithm Of Gene Feature Selection Based On Classification Technology
5	Study On Informative Gene Selection And Classification Of Tumor
6	Research On Swarm Intelligence Feature Selection Algorithm For Small Sample(Medical) Data
7	The Research On Gene Selection Based Shrinkage Feature Selection Algorithm For Cancer Classification
8	Research On Feature Selection Method For Chinese Medicine Metabolomics Data Based On Lasso
9	Research On Feature Selection Of Tumor Genes
10	Application And Research Of Filter Ranking Feature Selection Method In Leukemia Typing