Research On Multi-Label Data Modeling Based On Label Relationship

Posted on:2021-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:Q Gao

Full Text:PDF

GTID:2370330620963512

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The data in real-world is often multi-labeled.For example,a picture may include semantic annotation such as "desert","camel",and "blue sky";a news report may include topics such as "economy","war",and "politics";a paragraph may also include "pleasure" and "sadness." The high dimensionality of multi-label data will increase the difficulty of data mining tasks such as classification and clustering,such as increasing the time complexity of the mining algorithm and the complexity of the model.Feature selection is a data pre-processing technique that uses as a few features as possible to model the mining results under the same condition.The current research on feature selection of multi-label data has not yet made full use of the relationship between the labels.At the same time,after feature selection,we always use the existing classifier directly,and the results of feature selection are not well used.Therefore,we study the feature selection of multi-label data and the design of a classifier for the result of feature selection based on the fusion of label relationships.The main research contents and conclusions are as follows:(1)Multi-label attribute reduction based on fuzzy inconsistent pairsThis paper treats each label as a random variable,uses KL divergence to measure the label relationship,combines the label relationship,gives each label a weight,and combines the label weights to define fuzzy inconsistent sample pairs.The attribute importance is defined by the distinguishing ability of attribute pairs to fuzzy inconsistent sample pairs.A multi-label attribute reduction algorithm based on fuzzy inconsistent pairs is proposed.The validity of the proposed algorithm is proved by using 6 evaluation indexes in8 public multi-label data sets.(2)K-nearest neighbor multi-label classifier based on fusion of label relationshipsIn a multi-label data set,different labels can appear 0 or 1 at the sametime.Therefore,we consider that there is a linear relationship between the labels,and we use association rules to mine it.Then,we combine the mining results to give label weights,and use this to get the label sort sequence.We combine the existing attribute reduction algorithm with the weight of the label to obtain a new reduced feature subset.At the same time,we define a measurement formula of the sample distance based on the feature subset.We propose a K nearest neighbor multi-label classifier that fuses label relationships based on combining the sorted sequence of labels and the influence between labels.The effectiveness of the proposed algorithm is proved by using 6 evaluation indexes in 5 public multi-label data sets.In summary,we fuse label relationship in the algorithm of feature selection in multi-label data and a classifier for the result of feature selection.The fusion of the label relationship has improved over the result of previous non-fusion label relationship for the feature selection and classification of multi-label data,which provides new methods for the processing of multi-label data.

Keywords/Search Tags:

Multi-label data, Feature selection, Fuzzy rough sets, Label relationship, K nearest neighbor classification

PDF Full Text Request

Related items

1	Using Multi-label Learning Methods To Study Protein Subcellular Localization Prediction
2	A Multi-label Classifier Based On PSSM And GO For Predicting Protein Subcellular Localization
3	Predicting The Subcellular Localization Of Proteins With Multiple Sites Based On Multiple Features Fusion
4	Exploration Of Multi-label Classification In Bioinformatics
5	Research Of Multi-label Feature Selection Algorithms In The Form Of Nonlinear Programming
6	Uncertainty Measurement For Label-Incomplete Data And Its Applications
7	Extended Classification Methods And Their Interpretability Based On Axiomatic Fuzzy Sets
8	Research On The Problem Of Classification Of Time Series Data
9	Research On Prediction Of Sequence-based Multilocus Subcellular Localization
10	Research On Multi-label Classification Based On Decision Function