Font Size: a A A

Research On Data Cleaning Framework And Application For Open Government Data

Posted on:2021-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:G F ZhengFull Text:PDF
GTID:2416330602489615Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The government has rich and valuable data resources,and "open government data" can promote the effective use of resources and avoid the idleness and waste of data resources.China's "open government data" movement is in a stage of rapid development.In 2019 alone,more than 50 local government data open platforms have been added,meanwhile there are many problems wiyh the rapid development.The open data of our government has the problems of low data quality and irregular format,'compared with the developed countries such as the United States and the European Union.Data quality determines the availability and ease of use of data.The issues affect the effect of data opening of our government.Only high-quality data is available data.Data cleaning is a way to improve the data quality of our government's open data,but there is no suitable data cleaning framework and tools for the data quality problems existing in our government's open data,which will affect the effect of data opening.For this,the main work of the paper are as follows:(1)Investigate the open data of our government,find out the data quality problems,and record the quality problems of each dimension according to the general data quality dimension standard in the field of open data;(2)According to the types and characteristics of "dirty data" in our government's open data,the cleaning needs of our government's open data are defined,which are "dirty data" cleaning and data format conversion.A rule-based data cleaning framework is designed and developed,which is suitable for the open data of our government.According to the international and domestic data standards,the "dirty data" is cleaned by using the cleaning rules,and the data quality is improved.At the same time,the cleaned data is transformed into multi format data through the format conversion to meet the user's demands;(3)This paper investigate the opening of COVID-19 epidemic data in the data open platform of local governments in China,by analyzing the quality of epidemic data and obtain the quality metadata table through the data analysis module in the data cleaning framework,use the quality metadata table and the required data cleaning rules to clean the data,and compare the data before and after each cleaning rule,the "dirty data" data cleaning and data format conversion are realized.The availability of the data cleaning framework is proved.The purpose of this paper is to improve the data quality of open data of our government through data cleaning,and to provide some reference for the design of data cleaning framework in the field of open data of our government.
Keywords/Search Tags:Open Government Data, Data Quality, Data Cleaning, Cleaning Rules, Cleaning Framework
PDF Full Text Request
Related items