Research On Missing Values Imputation Of Tabular Data Based On Graph Reconstruction

Posted on:2023-12-20

Degree:Master

Type:Thesis

Country:China

Candidate:J J He

Full Text:PDF

GTID:2558307070484394

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Tabular data is one of the most common data types in real life,but there are often missing values in tabular data for various reasons.The presence of missing values leads to loss of information and makes it difficult to apply many algorithms that rely on data integrity.Therefore,missing value imputation for tabular data has become an indispensable step in data mining.Traditional statistical learning-based imputation methods tend to make strong statistical assumptions about data distribution and are usually suboptimal for mixed data imputation that includes continuous data.Machine learning-based imputation methods transform the problem of missing value imputation into a prediction problem in machine learning to solve.Graph,as an abstract structure that can effectively represent the relationship between entities,has recently received extensive attention from researchers.They proposed to use graph neural network to model the relationship of tabular data,and achieved good results.However,the effectiveness of these methods is heavily dependent on the quality of the graphs constructed from the data,and the current method graphs for converting tabular data into simple bipartite graphs cannot effectively represent the relationships between the data,thus limiting the interpolation of missing data values.complementary effect.In order to generate a graph structure that is more in line with the relationship between data,this paper proposes a method for structural discovery of tabular data,and reconstructs simple bipartite graph data,so that the reconstructed graph can better contain the structural information in tabular data,and proposed an imputation framework adapted to the new graph structure to fill in the missing values of the data.The main contributions of this paper are as follows:(1)A graph learning interpolation framework based on graph reconstruction is proposed.First,the structure of tabular data is discovered,and the graph data is reconstructed according to the obtained structural information.Due to the obvious difference between the reconstructed graph and the original graph data,the framework also proposes a mechanism for message propagation and information aggregation for the reconstructed graph,generating node embeddings on the reconstructed graph to perform link prediction tasks to complete missing value interpolation.(2)A graph reconstruction scheme based on association rules is proposed.The FP-Growth algorithm is used to mine the structure existing in the tabular data,and then the graph data is reconstructed according to the generated structural information-association rules.This scheme can effectively impute the missing values of tabular data mainly composed of categorical data.(3)The graph reconstruction method based on association rules needs to perform binning operation on continuous data,which is easily affected by the binning algorithm.In this paper,a graph reconstruction scheme based on graph autoencoder is proposed.Through the graph autoencoder,the low-dimensional embedding of each sample in the structure space is constructed,and the graph reconstruction process is guided by constructing the marginal probability matrix between samples.This scheme is able to perform efficient interpolation on mixed types of data.We tested the graph reconstruction scheme based on association rules on 7 public datasets,and its imputation accuracy was improved on 6datasets compared with 8 baseline models for missing value imputation.At the same time,we test the graph reconstruction scheme based on graph autoencoder on 12 datasets,and the imputation accuracy is improved by1%～20% compared to the best baseline model on all datasets.

Keywords/Search Tags:

Tabular Data, Missing Data Imputation, Graph Neural Network, Graph Reconstruction

PDF Full Text Request

Related items

1	Studies On Missing Data Imputation
2	Research On Missing Data Imputation Method Based On Generative Adversarial Network
3	Attribute Associated Neuron Modeling And Missing Value Imputation Based On Neural Network
4	Incomplete Data Modeling And Missing Value Imputation Based On Confidence
5	Research On Missing Value Imputation Of Incomplete Data
6	The Graph-based Semi-supervised Learning With Missing Data
7	Comparative Study On Imputation Methods Of Missing Data In XGBOOST Model Under Complete Random Missing Mechanism
8	Nonparametric Imputation For Missing Data
9	Research On Passenger Transport Data Quality Detection And Missing Data Imputation
10	Several Studies To Improve Deep Data Imputation