Font Size: a A A

Research Of LSSVM Method For Sample Missing Data

Posted on:2024-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:H WuFull Text:PDF
GTID:2568307127953409Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Classification methods are an important research area of machine learning and have high practical value.With the continuous development of information collection technology and mobile Internet,the scale of data available for research is growing and the variety of data is increasing,which greatly promotes the development of classification methods.However,the easy occurrence of missing data in the process of data collection or storage can cause incomplete data sets,and the missing data in the data set may seriously damage the classification performance of the classifier.Proper handling of missing data can help improve the performance of classifiers.Many studies have proposed solutions to the problem of missing sample data,such as directly removing samples of missing data or repairing missing data using interpolation techniques,but they tend to focus more on how to improve the overall performance of missing data,but rarely consider the impact of missing data in the dataset on classification.The research work in this paper is as follows:To address the problem of classification of missing data and the problem of missing data affecting recognition,a least squares support vector machine(LSSVM)is proposed based on the Learning Using Privileged Information(LUPI)paradigm.A Privileged Least Squares Support Vector Machine(P-LSSVM)is proposed based on the Least Squares Support Vector Machine(LSSVM),which can process the missing data and construct the pattern classification model simultaneously.The basic idea is to express the importance of each feature(including missing features)by an additivity kernel function,then derive the training of the complete data as privileged information and construct the P-LSSVM from it,and finally apply the leave-one-out cross-validation method to complete the unbiased importance identification of missing features.Experiments on several mainstream datasets show that compared with the classical missing data processing strategy,the P-LSSVM algorithm not only has better classification performance,but in addition can also simultaneously determine the importance of incomplete features,additionally providing a way to analyze sample features,which can provide guidance for the data collection process and help promote data quality improvement.There is a risk of degrading classifier performance by using missing samples containing missing values without discrimination during training.To determine the impact of missing samples,this paper analyzes the missing data in a sample from a sample perspective and proposes a privileged learning classification algorithm that can simultaneously identify the importance of missing samples and the construction of a pattern classification model,i.e.,a privileged least squares support vector machine for importance identification of missing samples(SPLSSVM),so as to achieve both improving its classification performance and simultaneously ensuring an unbiased determination of the effect of the missing samples on the classification error.Considering this effect as the importance of the missing samples can be used to assist in data cleaning as an alternative method to improve data quality.The basic idea is to treat the training of complete features as privileged information to guide the learning for the entire incomplete data,express the importance of the missing samples using an additivity kernel function and construct the SPLSSVM from this,and finally identify the importance of the missing samples unbiasedly by leave-one-out crossover validation.The results on a standard dataset and a case experiment show that the SPLSSVM algorithm not only achieves better classification performance overall,but also determines the importance of the missing samples simultaneously.
Keywords/Search Tags:Least Squares Support Vector Machine, Learning Using Privileged Information, data missing, data cleaning, additive kernel function
PDF Full Text Request
Related items