Font Size: a A A

Optimized Mahalanobis-Taguchi System Classification Method For High-Dimensional-Small-Sample-Size Imbalanced Data And Its Application Research

Posted on:2023-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:T MaoFull Text:PDF
GTID:2558307124478564Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
As one of the key research directions of data mining,data classification is not only a popular research direction in academia in the context of the era of big data,but also a practical problem in the field of practical application.Traditional classification algorithms require low data dimensionality,sufficient sample size,and basic assumptions that the data is evenly distributed among different classes and the misclassification costs are equal.They often perform poorly in the classification of high-dimensional-small-sample-size imbalanced data.At the same time,in actual production and life,with the development of information technology,highdimensional-small-sample-size imbalanced data appears in more and more fields and industries,such as manufacturing industry,financial industry and Internet industry.The Mahalanobis-Taguchi System(MTS),as an important method in the field of quality engineering,can achieve effective dimensionality reduction and classification in multi-dimensional systems.It has the characteristics of small sample demand,simple principle,high computing efficiency,and no data distribution requirements.It shows good classification effect and efficiency for imbalanced data without resampling.However,when classify the high-dimensional-small-sample-size imbalanced data,the MTS has some shortcomings such as the orthogonal table of MTS has certain restrictions on the feature dimension,the covariance matrix is singular when processing high-dimensional data,the signal-to-noise ratio gain is not necessarily the best strategy for feature selection,and the loss value of quality loss function used to determine the threshold is too subjective and it is not easy to weigh whether the value is correct or not,which will affect the classification performance to a certain extent.Based on this,this study aims at the classification problem of high-dimensional-small-sample-size imbalanced data,and integrates the idea of proper orthogonal decomposition,mutual information,goal programming,particle swarm optimization algorithm to optimize MTS and improve its performance in this type of data.The classification performance is verified through multiple data sets,and the algorithm is applied to the monitoring field of the semiconductor manufacturing process.Through the analysis of sensor data,the purpose of monitoring the production status of the process is achieved.The main work of this study is as follows:Firstly,construct Modified Mahalanobis-Taguchi System(MMTS)algorithm for high-dimensional-small-sample-size data classification.In the feature selection stage,the feature dimension is too large to arrange the orthogonal table when applying MTS.In the classifier construction stage,the singularity of the covariance matrix caused by the feature dimension is too large and the number of samples is too small.The study introduces the optimized idea of proper orthogonal decomposition for feature selection,realizes feature dimensionality reduction,applies the algorithm to a UCI data set,and verifies the algorithm by comparing it with the baseline algorithms.Secondly,construct Improved Mahalanobis-Taguchi System(IMTS)algorithm for imbalanced data classification.In the feature selection stage,in view of the problem that the signal-to-noise ratio gain is not always the best feature selection strategy,this study proposes the principles,namely maximizing mutual information between features and classes,minimizing mutual information between features,maximizing the initial classification accuracy,and selecting features that produce not only the extreme value of the difference between the mean Mahalanobis distances of normal and abnormal samples but also the largest number of features,to construct the target planning problem to achieve feature selection;in the classifier construction stage,in order to reduce the deviation caused by researchers’ subjectivity,this study applies particle swarm optimization algorithm to determine the best threshold.To verify the algorithm,this study applies it to five data sets with different class-imbalanced ratio,and compares with the baseline algorithms.Thirdly,construct Optimized Mahalanobis-Taguchi System(OMTS)ensemble algorithm for high-dimensional-small-sample-size imbalanced data classification,and apply it to semiconductor manufacturing process monitoring.Combining MMTS for high-dimensional-small-sample-size data classification and IMTS for imbalanced data classification to construct an OMTS ensemble algorithm for high-dimensional-smallsample-size imbalanced data classification,and apply it to the data set of semiconductor manufacturing process monitoring,determining the production status of the manufacturing process by analyzing sensor data.It compares with the baseline algorithms for algorithm verification.The study results show that the MMTS algorithm,IMTS algorithm and OMTS ensemble algorithm have better classification performance than other baseline algorithms.They realize effective dimensionality reduction without losing important feature information.and improve the classification accuracy rate,which can effectively monitor the semiconductor manufacturing process and realize the effective classification of production status.
Keywords/Search Tags:Mahalanobis-Taguchi System, high-dimensional-small-size sample, imbalanced data, data classification
PDF Full Text Request
Related items