Font Size: a A A

Research On Two Phase Classification Method Based On Three-Way Decisions

Posted on:2020-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z W XuFull Text:PDF
GTID:2370330578967720Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,discovering and extracting valuable and potential knowledge from massive data has become an important task in the process of data mining.Establishing an efficient classification model and standardizing information are important research directions in data mining process.Classification is the basic analysis method in the process of data mining.The traditional classification model usually pursues higher classification accuracy and ignores the misclassification cost.However,in practical problems,the decision-making risks are often exist and different.It is generally expected that the misclassification cost will be minimal when people making decisions.As a classification model,the three-way decisions divide the sample,which that unable to be decided temporarily into the boundary domain by introducing the minimum risk cost.This classification idea conforms to the cognitive rules of human and has a certain misclassification tolerance mechanism.It is necessary to study how to apply three-way decisions methods to data mining classification and it has certain practical significance for the process and application of current big data.For the problem of the boundary region contains a small number of samples and the information is missing,and resulting low classification accuracy in the three-way decisions in the classification task.Based on three-way decisions ideas,this paper designs two-phase classification correlation model and algorithm of the three-way decisions,further enhance the division of boundary region data,and improve classification accuracy.The main research of this paper contents are as follows:(1)Aiming at the problem of the number of boundary classification samples is small,the loss of information,and which resulting in low classification accuracy in three-way decisions classification task.A two-phase classification model based on three-way decisions(TWD-TP)is proposed.In the first phase,considering the influence of misclassification cost,the conditional probability of the samples in the three-way decisions is constructed by Bayesian rules,then the optimal loss function is solved to obtain the required threshold,and the data set is divided according to three-way decisions rules.The partitioning process is based on the least risk Bayesian decision theory.The divided positive and negative regions contain a certain amount of misclassified samples.In the second phase,the misclassified samples in the positive and negative regions are introduced into the boundary region as incremental information by the category label index,the classification information in the boundary region is added,and the boundary region is reconstructed.Finally,the classifier is used to illustrate the reconstructed boundary region.The experimental results show that the proposed TWD-TP model can not only select the samples with high misclassification characteristics in three-way decisions classifications,but also effectively classify the samples that cannot be correctly divided in the reconstructed boundary region,and the classification accuracy is further improved.(2)In the process of three-way decision classification,it is difficult for the classifier to carry out effective training,and the reason is that the boundary region contains insufficient classification information.To solve the above problem,a boundary region Bagging integration method based on three-way decisions is presented.The method is completed in two phase.In the first phase,the data set is divided into three branches according to the three-way decisions Bayes rules,then the positive region,the negative region and the boundary region are obtained.In the second phase,through the self-sampling of the sample boundary region samples,and a number of base learners with classification ability are trained,then the trained base learners are integrated by Bagging integration method.In the voting stage,each base learner is weighted by Probability,and finally the voting results are output.The experimental results demonstrate that the proposed algorithm can effectively divide the boundary region and improve the classification accuracy of the three-way decisions models.
Keywords/Search Tags:Three-way decisions, Data mining, Two phase, Boundary region, Ensemble learning
PDF Full Text Request
Related items