Font Size: a A A

Research On Directed Acyclic Graph Based Hierarchical Multi-label Classification Method

Posted on:2020-12-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:S FengFull Text:PDF
GTID:1360330590472803Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Multi-label classification is widely used in image classification,information processing,fault diagnosis and gene function prediction.If labels of an instance conform to a pre-defined hierarchical structure,the multi-label classification problem becomes a more complex hierarchical multi-label classification problem.Each node can have more than one parent node in a DAG,and hierarchical multi-label classification algorithms designed for the tree structure are not suitable for DAG structure.At present,the existing research mainly focuses on the hierarchical multi-label classification of tree structure,and the related theoretical analysis for DAG hierarchical multi-label classification is relatively few,and the research on the mathematical model is insufficient.In addition,the existence of hierarchy structure leads to the imbalance dataset problem,which can effect the ability and accuracy of classification.In the current study,the number of DAG hierarchical multi-label classification algorithms are less,and the accuracy of existing related algorithms is lower,and these algorithms can not well meet the real application requirements.An important application area of hierarchical multi-label classification is gene function prediction.Due to the structure of the widely used GO annotation is directed acyclic graph structure,the GO based gene function prediction problem can be changed to the DAG hierarchical multi-label classification problem.Therefore,research on the DAG hierarchical multi-label classification problem has great significance,which can improve the theoretical research level of hierarchical multi-label classification problem,accelerate the task of gene function verification and annotation,and also has referential significance for solving related problems in other areas.The main contents of this paper are as follows:Firstly,to solve the problem of lacking theoretical analysis and mathematical models on DAG hierarchical multi-label classification at present,this paper designs a new loss function——DAGH loss function,which integrates the hierarchy information and treats different prediction errors that may occur at parent and child nodes with different cost.After that,conditional risk of solving hierarchical multi-label classification is given by taking advantage of DAGH loss function,so hierarchical multi-label classification problem is changed to conditional risk minimization problem by Bayesian Decision theory based on minimum risk principle.A mathematical model to solve multi-label classification problems is clearly constructed by mathematical derivation and simplification of the optimization problem.This paper also gives the specific solving process and main steps of the hierarchical multi-label classification problem.The mathematical model proposed in this paper can transform the complex DAG hierarchical multi-label classification problem to a set of binary classification problems,which can provide the theory foundation for designing DAG hierarchical multi-label classification algorithm and solving DAG hierarchical multi-label classification problem.Secondly,in terms of imbalance dataset problem in hierarchical multi-label classification,which is more obvious when the hierarchy is deeper,this paper proposes a method to generate training set of each node in DAG,when solving the problem of hierarchical multi-label classification based on the proposed mathematical model.When a training set is generated for a node,first,use the improved siblings strategy to select positive and negative samples and generate the original training set,the strategy takes into account the hierarchical information when constructing the training set,which can alleviate the imbalance problem of the data set.Then,the original training set is processed by using the proposed clustering-based hybrid sampling method CHS,which turns the original training dataset into a balanced training set.The proposed method can generate balanced training sets at each node,which helps to alleviate the impact of unbalanced data sets on classification results.Thirdly,in order to solve the problem that there are only a few algorithms for solving DAG hierarchical multi-label classification and these algorithms have low accuracy and cannot meet application requirements,a DAG hierarchical multi-label classification algorithm,HMC-DAG algorithm,is proposed based on the mathematical model constructed in this paper to solve the hierarchical multi-label classification problem.The algorithm uses the training dataset generation method proposed in this paper to construct the training set of each node,which can effectively alleviate imbalance dataset problem at the data level.HMC-DAG algorithm has no special requirements for which type of binary classifier should be used,so it can flexibly select binary classifier according to the needs,and effectively utilize the latest achievements of classification research in the field of machine learning.In this paper,two kinds of HMC-DAG algorithms using SVM and MLP as basic classifiers are presented,which are HMC-DAG-SVM algorithm and HMC-DAG-MLP algorithm.DAGLabel greedy algorithm is designed and added to the HMC-DAG algorithm to solve the optimization problem described by the mathematical model proposed in this paper.The DAGLabel greedy algorithm can obtain the optimal classification result on the premise that the classification result of the algorithm meets the requirements of hierarchical constraint.The experimental results show that the proposed algorithm can effectively solve the DAG hierarchical multi-label classification problem,and it has better performance compared with other state-of-the-art algorithms.
Keywords/Search Tags:Multi-label classification, Hierarchical multi-label classification, DAG, Loss function, Hierarchy constraint
PDF Full Text Request
Related items