Font Size: a A A

Research On Missing Data Imputation Method Based On Generative Adversarial Network

Posted on:2024-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:L QinFull Text:PDF
GTID:2558307079492814Subject:Electronic Information·Computer Technology (Professional Degree)
Abstract/Summary:PDF Full Text Request
Missing value filling is an important part of feature engineering,aiming to meet the needs of some models that do not support missing value input and improve the performance of the model,and it is found that the filling operation using different ideas has a significant impact on the accuracy of the model after training.Thanks to the generative adversarial network’s ability to generate realistic data,its use in missing value filling work can maximize the recovery of the original distribution of the data and maintain the accuracy of data analysis and modeling.Due to the high efficiency of generative adversarial network training and generation,and good adaptability to different distribution feature datasets,the efficiency of feature engineering and the quality of the dataset can be greatly improved.This paper mainly does the following two aspects of research:(1)In order to improve the efficiency of feature engineering and the quality of datasets,this paper proposes a new method for imputing missing data in datasets by generative adversarial networks according to the correlation between features,which uses the Pearson coefficient matrix of each feature in the dataset and the statistical distribution features of the dataset to introduce the training target of the generator model,combined with the Wasserstein distance in the WGAN(Wasserstein Generative Adversarial Networks)network to measure the gap between the original data and the generated data output by the generator model,which avoids the problem of gradient disappearance when the difference between the original data and the generated data is too large in the training process.The rationality of imputation and the accuracy of the model training model after data imputation are improved.At the same time,compared with other imputation methods,the proposed method is more stable for datasets with different characteristics.(2)This paper proposes an industrialized data mining solution that can carry out continuous pipeline operations from data input to analysis result output.In order to meet the needs of different industries to use the data generated in the production process for analysis,modeling and quantitative prediction,this paper adopts a distributed architecture storage and computing framework for large-scale data input from the scenarios that may be encountered in actual production,and simplifies the operation of users to adjust various process parameters in the system as much as possible.The generation of adversarial network imputation missing data is regarded as a function in the mining system,which fully reflects the advantages of the missing value filling method in terms of work efficiency and adaptability.The simple and powerful system will help the development of industrial digitalization to industrial intelligence.
Keywords/Search Tags:missing value imputation, Generative Adversarial Networks, data mining, feature engineering
PDF Full Text Request
Related items