Font Size: a A A

Research On Filling Method Of Missing Categorical Data

Posted on:2019-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:L L PanFull Text:PDF
GTID:2437330572951808Subject:Statistics
Abstract/Summary:PDF Full Text Request
Missing data processing is a very important part of data preprocessing.The exis-tence of missing data affects the estimate of the model,model test and so on,research how to effectively deal with missing data has very important significance.This paper mainly discusses methods to fill the missing data of classification data.Assume that classification attribute data is decided by a latent varaible that obey the standard nor-mal distribution,and then discussed two kinds of situations,one kind is no dependent variable,another kind is data set containing the dependent variable(here only for one dependent variable of discussion).In the case of no dependent variable,proposing a new filling algorithm(TKNN filling algorithm)on the translated data.In this paper,in the case of a dependent vari-able,after conversion using the regression equation to estimate the missing value,name-ly the improved regression fill algorithm(TReg fill algorithm).RMSE with root mean square error is used as the evaluation index,and it can be seen that the algorithm to fill the data set is effective to a certain extent..In this paper,the main conclusion is:1.the lack of scale is small,with a complete individual analysis can get good results.2.TReg and TKNN filling algorithm to a certain extent,better than directly on the original data set.Hope this article has certain reference significance for study of missing data and data analysis.
Keywords/Search Tags:missing data, latent variable, near neighbour, categorical data, regression
PDF Full Text Request
Related items