Font Size: a A A

Breast Cancer Diagnosis Based On Feature Selection And Support Vector Machine

Posted on:2020-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y W ZhengFull Text:PDF
GTID:2404330596986007Subject:Statistics
Abstract/Summary:PDF Full Text Request
Tumor is one of the most serious lethal factors in the world,endangering human life and health.Breast cancer was the leading cause of death among women.The key point of curing breast cancer lies in the early diagnosis and treatment of the disease.Early diagnosis of the tumor is of great significance for clinical treatment.Therefore,it is particularly critical to find an algorithm that can accurately identify the tumor type and treat it in advance.With the development of science and technology and the pursuit of health,people collect more and more data related to diseases.At the same time,the increase of characteristic information of high-dimensional data also means that the amount of calculation for data analysis and processing increases,and the required characteristic space also increases exponentially,causing the so-called "dimensional disaster".The primary task of data analysis is to reduce the dimensionality of feature data,eliminate redundant features,reduce computing pressure,and make the results more accurate.This paper analyzes and studies breast cancer from three aspects,including:(1)for breast cancer clinical data,based on hierarchical clustering and support vector machine algorithm(H-SVM),Firstly,dimensionality reduction is performed on breast cancer data,and hierarchical clustering algorithm is used as feature selection method,to extract the new model of data,then transform data according to the clustering results of raw data,form new data sets,and the support vector machine classifier classification algorithm is used to diagnosis new data set,to get a relatively good results.(2)based on K-medoids clustering and support vector machine algorithm(KDSVM),use the K-medoids clustering algorithm as a feature selection method for breast cancer data dimension and extract the new model,and then according to the clustering results used to convert the raw data of the new data set,and on the new data set using support vector machine classifier algorithm to classify diagnosis,the classification accuracy is better than that of H-SVM algorithm.(3)the higher dimensional for breast cancer gene expression data,based on regularization and genetic classification algorithm of support vector machine(SVM)is combined with,respectively,using Lasso and Elastic net two regularization penalty methods,the genetic screening for genetic traits,and characteristics of screening of the application of support vector machine(SVM)classification,get higher classification accuracy.
Keywords/Search Tags:breast cancer, machine learning, feature selection, clinical data, genetic data
PDF Full Text Request
Related items