Font Size: a A A

Comparative Research On Data Filling Algorithms Under Different Missing Mechanisms

Posted on:2022-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q ZhengFull Text:PDF
GTID:2480306779978639Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Missing Data is a common phenomenon in statistical investigation and research,which often leads to unreliable statistical inference results.Common methods of missing data processing include deleting sample points or observation variables containing missing values,not processing,and filling missing values,etc.This paper mainly studies the application scenarios and parameter optimization problems of common filling algorithms under different missing mechanisms and different missing rates,and compares the improved algorithms through examples,in order to provide reference for practical application.Firstly,the mathematical description of three kinds of missing mechanisms,namely Missing Completely at Random,Missing at Random and Not Missing at Random,is given by mathematical induction,and the simulation codes of three kinds of missing mechanisms are given.Then the K-Nearest Neighbor algorithm is optimized by cross validation method and Gaussian function weighting method,and a weighted K-Nearest Neighbor filling algorithm is proposed.At the same time,in view of the K-Nearest Neighbor of Weighted fill algorithm for different sample points missing value calculated neighbor distance difference caused by unreasonable weights allocation problem,through specific observation data set to dynamic adjustment,the parameters of the Gaussian function was proposed based on dynamic K-Nearest Neighbor of Weighted fill algorithm,theoretical research and empirical analysis shows that,This method can improve the filling effect and has certain universality.Finally,the K-Nearest Neighbor of Weighted fill algorithm to rely too much on the adjacent sample points and led to the decrease of the algorithm stability problem,this paper creatively using the algorithm of Missing Forest for its fill the calibration results,fill algorithm against loss rate increases and lead to decline gradually,In this paper,iterative method is used to gradually reduce the missing rate of incomplete data sets in the filling process,The K-Nearest Neighbor of Weighted and Missing Forest Hybrid Iterative filling algorithm is proposed.Empirical analysis shows that,based on the premise of different missing mechanisms and missing rates,this method not only inherits K-Nearest Neighbor of Weighted algorithm in filling accuracy,but also inherits Missing Forest algorithm in stability.
Keywords/Search Tags:Missing Mechanism, K-Nearest Neighbor, Iteration Method, Missing Forest, Dynamical Parameters Regulating
PDF Full Text Request
Related items