Font Size: a A A

Application Research Of Data Model For Big Data Warehouse In E-Government

Posted on:2021-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:D H ZhengFull Text:PDF
GTID:2416330602470661Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of big data,the data engine has become the core driving force for organizational service innovation,economic and social development,and the modernization of national governance capabilities.Building a new platform driven by big data has become an important part of E-government development.The traditional data warehouse based on relational database management system has some limitations in storing,processing and analyzing large scale and wide-ranging data,the problem of poor data quality in the field of E-government is found in data aggregation,resulting in insufficient data credibility,which affects data sharing and big data-aided decision analysis,it is urgent to realize effective data management and governance through data modeling of big data warehouse.Massive data is integrated into data warehouse,it is impossible to collect full data every time in most scenarios due to the large size of the data,the incremental collection of massive data is widely concerned.The data model for big data warehouse applied in E-Government is studied in three aspects: data hierarchy,data governance model and incremental data collection.Firstly,under the dimensional data modeling theory proposed by Kimball and Hive data warehouse built on Hadoop,a hierarchical architecture of data model for big data warehouse is given,the architecture is divided into: Data Staging Store(STG),Operational Data Store(ODS),Public Data Warehouse(PDW)and Application Data Mart(ADM).By the hierarchical architecture and naming rules,the data model for big data warehouse is designed and realized,so that the big data could be used to support scientific decision-making and precise strategy in E-Government.Secondly,the data collected from the government departments is found in the low quality level,referring to process area of data governance in the data governance framework and the hierarchical architecture of data model,a data quality governance model is studied and discussed.The model guides the data quality of source to be optimized and improved from the content format,data model and data standards with data quality rules and feedback loop.Data quality rules are divided into data transformation rules and data audit rules.PDCA(Plan Do Check Action)quality management method is adopted by feedback loop to trace the data quality problems of data source.Thirdly,by studying synchronization technology in the full and incremental data integration,the NICDC(non-intrusive change data capture)method is proposed.Referring to the idea of full table comparison and timestamp method,the method is comprised of calculations in the time dimension and space dimension,so that change data capture performance can be improved from data column and data row respectively.In practice,the NICDC method can quickly obtain changed data without upgrading the business system,and improve the efficiency of data integration and accuracy of changed data for the big data warehouse.
Keywords/Search Tags:big data warehouse, Hive, data hierarchy, data governance, data collection
PDF Full Text Request
Related items