Font Size: a A A

Research Of Classification Methods On Binary Imbalanced Data

Posted on:2022-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:S S KongFull Text:PDF
GTID:2518306752969209Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Unbalanced data classification,as an important research content in machine learning and data mining,has attracted extensive attention from many scholars in recent years.Unbalanced data exists widely in practical application fields.Due to the unbalanced distribution of various samples in unbalanced data sets,many classifiers were designed to guide the learning process in the way of overall classification precision/accuracy rate,which are not suitable for the classification task of unbalanced data.Therefore,for the design of the classification model,how to improve the classification performance of minority class while maintaining the classification performance of majority class is a major challenge for the design of unbalanced data classifier.In order to improve the classification performance of minority class in the unbalanced data while maintaining the global classification performance.This paper improves the model of Extreme Learning Machine(ELM)and Lattice Machine(LM)respectively,and proposes an Extreme Learning Machine with initialized parameters based on external class invasion degree and a Lattice Machine based on boundary extension.Combining with the common evaluation indexes of classification performance,a kind of evaluation index based on the confidence level was proposed for collaborative evaluation of the classification performance of classifiers.Specific content includes the following three aspects:1.An extreme learning machine with initialized parameters based on external class invasion degree is proposed.In order to solve the problem that the input weight of random initialization of extreme learning machine cannot reflect the distinguishing ability of each feature to different samples.The distinguishing ability of each feature is calculated by designing the invasion degree of external class,and the parameters of random initialization are modified according to the distinguishing ability.For the data containing two kinds of samples,the value distribution of two kinds of samples under a certain feature is calculated respectively,then the invasion depth ratio and the invasion number ratio in the overlapping region are calculated to calculate the external class invasion degree of the feature.According to the invasion degree of the external class,the initial weight of the feature with strong class distinguishing ability is improved in the extreme learning machine.Due to the unbalancing of data,the features of the minority class with higher classification ability are usually different from majority class,and the algorithm strengthened the weights of features of minority class with high classification ability,and also strengthened the weights of features of majority class with high classification ability,and thus can improve the classification performance of extreme learning machine on imbalanced data sets.Compared with two improved algorithms based on extreme learning machine and three data sampling based SVM algorithms,it can be found that the proposed algorithm has higher precision rate and F1-score on seven unbalanced data(2%-29% higher),and the algorithm is also stable.2.A Lattice Machine algorithm based on boundary extension is proposed.Lattice Machine is a classification learning method based on spatial coverage.It builds hyper tuples to model different classes samples.However,according to the sample distribution characteristics of unbalanced data,the data space covered by the positive hyper tuples(hyper tuples of minority class)constructed by the Lattice Machine is much smaller than that of the negative hyper tuples(hyper tuples of majority class),so the classification effect of the Lattice Machine on minority class is lower than that of majority class.At the same time,the Lattice Machine only partially covers the data space,which may lead to a low recall rate when it performs on the classification task.Based on the above problems,a lattice machine based on boundary extension is proposed.By extending the positive hyper tuples in the outer direction,the positive hyper tuples can be extended to the maximum extent,and then the classification performance of the classifier on minority class can be improved.At the same time,due to the data space covered by the model is expanded,the recall rate is improved while the accuracy is maintained.In comparison with three data sampling based SVM algorithms,it can be found that the extended lattice machine achieves higher precision and F1-score on nine imbalanced datasets(2%-19%higher).3.A classification performance evaluation index based on confidence level is proposed.In the field of unbalanced data classification,it is necessary to use specific evaluation indexes to evaluate the performance of the classifier in order to consider the actual classification effect of different samples with different distributions.At the same time,the reliability of classification results should also be considered for disease-related classification problems.Even if the accuracy of classification results is high,if the reliability is low,the classification results will lack trust and be difficult to be applied in practice.Therefore,a classification performance evaluation index based on confidence level is proposed in this paper.This method is used to determine the confidence of the classifier classification results by grouping the likelihood output of the classifier on each test samples into different ranges.Finally,according to the confidence level,precision rate and F1-score,the classification performance of the classifier on the unbalanced data was analyzed and judged,and the classifier with both high precision rate and high confidence could be identified.The experimental results show that the proposed confidence level index can provide trust evaluation for the classification results to determine whether the classification results can be trusted.
Keywords/Search Tags:Unbalanced data classification of two classes of data, Extreme learning machine, Lattice machine, Precision and F1-score, Confidence level
PDF Full Text Request
Related items