Font Size: a A A

Research On Imbalanced Data Classification Using Stochastic Configuration Networks

Posted on:2024-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:C F NingFull Text:PDF
GTID:2568307118976319Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In the era of informationization tide and unprecedented expansion of data availability,imbalanced data inevitably exists in massive data,imbalanced data widely exists in industrial production and life,such as machinery fault diagnosis,status monitoring,medical diagnosis,network intrusion detection,etc.The traditional classification algorithm focuses on the overall accuracy rate based on the assumption that the number of samples is balanced or the cost of misclassification is equal.When faced with imbalanced data classification task,due to the quantitative advantages of the majority class,the minority class will be overwhelmed by the majority class in the learning process,with ignorance on the minority class,contributing to poor classification accuracy on the minority class.Therefore,how to construct a classification algorithm to boost up the classification accuracy of the minority class is an urgent problem to be addressed.In the age of big data,with the increase of data size and complexity,machine learning classification methods based on neural networks have emerged one after another.A novel incremental learning method,termed as stochastic configuration networks(SCNs),is one of the representatives.SCNs have been widely used in industrial process modeling and data analysis due to their fast convergence speed,good generalization performance and universal approximation property.However,the existing SCNs and their variants do not take into account the imbalanced data.Therefore,based on the theory of SCNs,this thesis integrates improved SCNs into imbalanced data classification.The main work and innovative achievements are highlighted as follows.(1)SCNs based on sample size and density imbalanced data classification method are proposed.Aiming at the problem that imbalance data is not considered in the construction process of SCNs,which results in poor classification performance,different weights are assigned to different categories by constructing a weight matrix.Considering the imbalance of sample size and distribution,the balance factor is introduced to alleviate the performance deviation in distribution,and the fuzzy membership is employed to weaken the impact of outliers on the classification performance.Finally,the universal approximation property is well proved,and the superiority and effectiveness of the method in dealing with the imbalanced data classification is demonstrated.(2)Hybrid density based class-specific SCNs for imbalanced data classification is proposed.Focusing on overlaps and small disjuncts,which seriously influences the classification performance.Therefore,the hybrid density is introduced to evaluate the distribution of samples within and between classes,so as to explore more distribution information between samples.In addition,there exists inconsistencies in the dispersion degree between samples,which is well resolved by assigning different regularization parameters to different categories.Finally,through the classification task on the machine learning standard database,it is validated that the proposed method is superior to SCNs based on sample size and density imbalanced data classification method.(3)Joint probability based SCNs for imbalanced classification method is proposed.According to the importance of different categories of imbalanced data sets,there is a lack of theoretical basis for artificially setting weight coefficients,the density factor and distance factor is adopted to evaluate the amount of information carried by the minority class,the weight is updated in light of the amount of information.The experimental results demonstrate that the proposed method significantly outperforms SCNs based on sample size and density and hybrid density based class-specific SCNs.
Keywords/Search Tags:stochastic configuration networks, imbalanced data classification, hybrid density, class-specific, joint probability
PDF Full Text Request
Related items