| In the field of health and medical services,data analysis techniques are usually applied to aid doctors to diagnose diseases.Chronic diseases are complicated and have high probabilities to cause corresponding medical comorbidities,which leads to a patient is possible to suffer from more than one disease.Aiming at helping a doctor to discover more potential chronic diseases of a patient,the model of aid diagnosis of chronic diseases needs to suggest multiple possible diseases to the doctor when the doctor is diagnosing some diseases of the patient.Multi-label learning methods output multiple relevant results of a predictive sample.Thus,the model of aid diagnosis of chronic diseases can be trained by multi-label learning methods.However,the existing multi-label learning methods are limited in analyzing medical data.To improve the performance of the model of aid diagnosis of chronic diseases,some key problems of multi-label learning methods in analyzing medical data are studies in this paper.The main research contributions of this paper are as follows:(1)Class-imbalance exists in medical data,which may lead to performances degradation for most multi-label learning methods.To alleviate the influence of class-imbalance,a cross-coupling aggregation integrating a regularized ensemble of multi-class classification is proposed.For each label,this method decomposes a multi-label learning task into a binary classification task and multiple multi-class classification tasks firstly.Then,it trains a binary classifier,and trains multi-class classifiers using a regularized ensemble of multi-class classification.To enhance the capability of handling class-imbalanced data,it finally aggregates the binary classify and multi-class classifies to predict the relevant labels of a predictive sample.The experimental results on medical dataset verify the superiority of the proposed method in dealing with class-imbalanced medical data.(2)The multi-label learning model trained by using all symptoms of medical data may have lower accuracy and efficiency.To deal with this problem,a multi-label learning with label specific features using feature information(LSF-FI)is proposed to learn label specific features for each label with consideration of both correlation information in label space and correlation information in feature space.In LSF-FI,the instance correlation in feature space is computed by probabilistic neighborhood graph model,and label correlation in label space is computed by cosine similarity.For multi-label data,LSF-FI has the capability to select label specific features for each label as well as classify an unseen instance into a set of relevant labels.The experimental results on medical dataset demonstrate LSF-FI can improve the performance of the model of aid diagnosis of chronic diseases.Moreover,other public datasets were used to conduct experiments to verify the applicability and the superiority of LSF-FI.(3)To improve the accuracy of the multi-label learning model by analyzing the correlations among multiple diseases,a fast random k-labelsets based on label correlations is proposed.To make full use of label correlations,the proposed method firstly calculates k nearest neighbors for each sample based on Euclidean distance.Then it constructs a neighbor label indices matrix and a neighbor label absence matrix for each label,and measure the positive correlation of two labels based on the similarity of their neighbor label indices matrices and measure the negative correlation of two labels based the similarity of their neighbor label absence matrices.At last,it selects the k-labelsets in terms of positive correlation and negative correlation of its labels.After selecting k-labelsets,the proposed method tends to achieve efficient multi-label learning.For each k-labelset,a regression model is applied to dvide instances into a related class and an unrelated class,then the LP model is utilized to train related instances.Medical dataset was used to conduct experiments to verify the proposed method can improve the accuracy and the efficiency of the model of aid diagnosis of chronic diseases.Furthermore,the experimental results on other public datasets show the proposed method possesses outstanding classification performance. |