Multi-label Feature Selection Algorithm Based On Sample Differences

Posted on:2020-09-30

Degree:Master

Type:Thesis

Country:China

Candidate:L Tang

Full Text:PDF

GTID:2404330575453371

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In the era of big data,medical diagnosis based on data mining technology plays an important role in medical assistance.Medical diagnosis can quickly judge patients’ cases,characteristics,disease types and severity,and has high requirements for real-time and accuracy.At present,feature ranking and selection algorithms have been applied to many fields in the field of large data analysis.In this paper,aiming at the characteristics of high-dimensional feature space and multi-labeling of label space presented by data,we propose features that play an important role in judging the category of samples in multi-label data sets.Therefore,through in-depth analysis of the inherent characteristics of the existing data,based on granular computing thinking,from the point of view of sample granulation and feature granulation,the characteristics with different categories of samples are sought.On this basis,the selected features are used for classification modeling to improve the accuracy and generalization ability of data.Based on information granulation,the following aspects are studied in this paper: sample granulation and feature granulation.(1)Because multi-labeled data often presents the characteristics of high-dimensional and small samples,it is easy to have over-fitting problems in classification modeling.In this paper,a multi-label feature selection algorithm based on sample diversity is proposed,which granulates the features in data sets according to clustering technology and combines the knowledge of large intervals.The experimental results show that the model can effectively improve the classification accuracy and reduce the computational cost in the process of feature selection.(2)In order to measure the correlation between features and the similarity between features and markers,in order to select high-quality features,this paper proposes the concept of neighborhood difference factor in multi-marker learning environment.Neighborhood difference factor can be used not only to measure the discriminative ability of feature subsets,but also to effectively distinguish the differences between samples.The effectiveness of the proposed algorithm is verified from three aspects: the compactness of feature subset,classification accuracy and the change of classification accuracy to the number of features.

Keywords/Search Tags:

Medical diagnosis, Feature selection, Information granulation, Large interval, Clustering

PDF Full Text Request

Related items

1	Fundamental Theory And Application Study On Large For Gestational Age Infants Using Machine Learning Techniques
2	Research On SNP-based Feature Selection And Diagnosis Model For Schizophrenia
3	Multi-task Feature Selection Algorithm And Its Application For Multimodal Neuro Image
4	Research On Feature Selection And Classification For Medical Imbalanced Data
5	Research On Feature Selection And Classification Method Of FMRI Data Based On Statistical Information
6	Research On The Application Of Combination Feature Selection Algorithm Based On CNN In Medical Data
7	Semi-supervised Clustering Ensemble For Bio-molecular Pattern Mining
8	Research On Tumor Feature Gene Selection Method Based On DNA Microarray Data
9	Research And Implementation Of Medical Image Retrieval System Based On Attention Clustering Feature
10	Applicaton Of Hybrid Feature Selection Model In The Diagnosis Of Gestional Diabetes