Font Size: a A A

Research On Machine Learning Method Of High Dimensional Small Sample (Medical) Data

Posted on:2022-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:T YangFull Text:PDF
GTID:2494306329472984Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence in the medical industry,clinical diagnosis and treatment has entered the era of intelligence.With the continuous production of medical data,how to obtain information that contributes to medicine has become a new direction for smart medicine.In this direction,feature selection is the most important method.It can construct a corresponding feature selection model based on the distribution of the initial feature set of the data,and use the selected feature subset to achieve classification.Feature selection methods and classification algorithms are usually based on the premise that the sample is balanced and the number is sufficient.It is difficult to obtain a complete sample of a medical data set,the sample size is small,and the feature dimension is high,which is a small sample data set;and the relationship between the features of the medical data set is difficult to define,not only the number is large,and there may be a certain relationship between each other.Since the high-dimensional feature space has a great impact on the application of the algorithm,it may also contain a variety of redundant information,which affects the judgment of the classification algorithm.Therefore,in the face of insufficient samples and high-dimensional medical data samples,the screening method of conventional feature subsets is difficult to provide effective training for the classification algorithm,and the accuracy of the classification algorithm is not high.At present,how to apply feature selection and machine learning algorithms to high-dimensional and small-sample medical data sets is a difficult point in research.In response to the above problems,this thesis conducts unbalanced learning research,improves feature selection algorithms,selects the optimal feature subset to assist clinical medical treatment,builds a classification model,and improves the effectiveness of classification and recognition of high-dimensional and small-sample data.The main work is as follows:1.Aiming at the problem of setting the weight of feature selection,a rank sum test algorithm based on Relief F and LASSO(Wilcoxon-Relief F-LASSO,WRL)is proposed.The algorithm generates feature importance scores and ranks feature importance through feature selection algorithms;it solves the problem of weight setting in feature selection.Compared with the traditional Relief F algorithm and LASSO algorithm,WRL focuses on improving the effectiveness of feature selection algorithms and classification and recognition performance.This paper applies the algorithm to the colorectal cancer medical data set,establishes a clinical medical model,and validates the results.Experimental results show that the feature selection results and classification accuracy of the algorithm are better than the Reilief F,LASSO,and MRMR algorithms.2.Aiming at the correlation between feature selection,a feature selection algorithm based on WRL and maximum information coefficient(WRL-MIC-CFS,WMCFS)is proposed.This algorithm takes into account the correlation between features and traverses all feature subsets to select the optimal solution,which solves the problem that the traditional feature selection algorithm is easy to fall into the local optimal solution while extracting high-quality feature subsets,and improves the features.The validity of the choice.In this paper,the algorithm is applied to the colorectal cancer medical data set to establish a colorectal cancer classification prediction model.The experimental results show that the algorithm’s classification accuracy,variance,anti-overfitting,accuracy and recall rate are better than WRL,WCFS(WRL-CFS),MRMR and other feature selection methods.This article provides effective solutions for clinical diagnosis,improves the existing feature selection methods,makes them better applied to high-dimensional small sample data,supplements the processing schemes of machine learning related algorithms in actual medical auxiliary diagnosis,and explores the induction The expression of suspicious proteins in cancer has provided scientific methods for current medical decision-making and clinical research.The feasibility of this research is verified by testing the real medical data of small samples with high dimensions.
Keywords/Search Tags:Feature selection, small sample, classifier, high dimension sample
PDF Full Text Request
Related items