Font Size: a A A

Multi-label Feature Selection Algorithm Based On Sample Differences

Posted on:2020-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:L TangFull Text:PDF
GTID:2404330575453371Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of big data,medical diagnosis based on data mining technology plays an important role in medical assistance.Medical diagnosis can quickly judge patients' cases,characteristics,disease types and severity,and has high requirements for real-time and accuracy.At present,feature ranking and selection algorithms have been applied to many fields in the field of large data analysis.In this paper,aiming at the characteristics of high-dimensional feature space and multi-labeling of label space presented by data,we propose features that play an important role in judging the category of samples in multi-label data sets.Therefore,through in-depth analysis of the inherent characteristics of the existing data,based on granular computing thinking,from the point of view of sample granulation and feature granulation,the characteristics with different categories of samples are sought.On this basis,the selected features are used for classification modeling to improve the accuracy and generalization ability of data.Based on information granulation,the following aspects are studied in this paper: sample granulation and feature granulation.(1)Because multi-labeled data often presents the characteristics of high-dimensional and small samples,it is easy to have over-fitting problems in classification modeling.In this paper,a multi-label feature selection algorithm based on sample diversity is proposed,which granulates the features in data sets according to clustering technology and combines the knowledge of large intervals.The experimental results show that the model can effectively improve the classification accuracy and reduce the computational cost in the process of feature selection.(2)In order to measure the correlation between features and the similarity between features and markers,in order to select high-quality features,this paper proposes the concept of neighborhood difference factor in multi-marker learning environment.Neighborhood difference factor can be used not only to measure the discriminative ability of feature subsets,but also to effectively distinguish the differences between samples.The effectiveness of the proposed algorithm is verified from three aspects: the compactness of feature subset,classification accuracy and the change of classification accuracy to the number of features.
Keywords/Search Tags:Medical diagnosis, Feature selection, Information granulation, Large interval, Clustering
PDF Full Text Request
Related items