Font Size: a A A

Study Of The Classification Method Of Imbalanced Multi-Label Data Based On Label Correlation

Posted on:2024-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:J C DuanFull Text:PDF
GTID:2568307157451234Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
With the increasing use of data in modern society,traditional single-label classification methods are unable to meet the processing demands of high-dimensional and multi-label information.Therefore,multi-label classification methods have been widely applied.However,as data complexity increases,the problems faced by multi-label learning become more complex.On the one hand,the problem of class imbalance data is widespread in multi-label data and is more severe.On the other hand,the mining and utilization of label correlations is also a topic that cannot be ignored.To solve these problems,the multi-label learning field has developed various methods.However,overall,these methods either do not consider the class imbalance problem or ignore the correlation between labels.Even some algorithms that consider both problems often have flaws in their approach,which may lead to extremely inaccurate guidance in certain classification scenarios.This thesis proposes two more robust and universal solutions.Specifically,the research content and innovative achievements of this study mainly cover the following two points:1.MLHC: A hierarchical clustering-based imbalanced multi-label learning algorithmMLHC is a hierarchical clustering-based imbalanced multi-label learning algorithm that solves the class imbalance and label correlation problems by considering the hierarchical structure of the label set.Specifically,the algorithm partitions the label set by hierarchical clustering,combines labels into several strongly correlated label clusters,and then transforms the multi-label learning into multi-class learning through a series of operations such as "Encoding" and "Decoding".This enables traditional imbalanced classifiers to directly participate in multi-label learning while considering the correlation between labels.Therefore,MLHC can not only solve the class imbalance problem but also better utilize the correlation between labels,thereby improving the performance of multi-label learning.The experimental results demonstrate the effectiveness and feasibility of exploring label correlation and addressing class imbalance by transforming the label set.2.ECC++: An improved algorithm family based on ensemble classifier chainsECC++ is an improved algorithm family based on Ensemble of Classifier Chains.It combines them with three traditional class imbalance learning methods: sampling,cost-sensitive learning,and threshold-moving,as members of the ECC++ algorithm family,to address label correlation and class imbalance problems simultaneously.In the experimental process,the algorithm family also uses the Extreme Learning Machines and the Support Vector Machines as control groups to demonstrate that the experimental results are independent of the classifiers,ensuring the objectivity and fairness of the results.The experiments confirm the effectiveness and feasibility of exploring label correlation and addressing class imbalance from the perspective of expanding the feature set.The biggest difference between MLHC and ECC++ lies in the different ways they explore and utilize label correlations.MLHC starts with the label set,mines and utilizes label correlations through clustering,"Encoding," and "Decoding" operations,while ECC++extends label correlations to data features.The biggest similarity between the two is that they both inherit the idea of problem transformation and are high-order label correlation strategies that consider the problem of class imbalance.To verify the effectiveness and superiority of the two algorithms,this study conducted experiments on 12 different multi-label datasets and compared them with several popular multi-label class imbalance algorithms.The experimental results show that both MLHC and ECC++ exhibit excellent performance in both Macro-F1 and Micro-F1 metrics,and compared to other algorithms,they perform better in solving label correlation and class imbalance problems.
Keywords/Search Tags:Multi-label Learning, Class Imbalance, Label Correlation, Hierarchical Clustering, Ensemble of classifier chains
PDF Full Text Request
Related items