Research And Implementation On Mass Data Cleaning In E-Government System

Posted on:2011-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:Q L Zhu

Full Text:PDF

GTID:2178360302980380

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology, organizational managers depend on data more and more when making their decisions. On the foundation of database there appears data warehouse which can support decision analysis. But during the construction of data warehouse, data from different data sources are inputted into the data warehouse, there may exist many data qualitative problems, result in false decisive analysis and influent quality of information service. There is a strong need to carry out a data cleansing process to improve the data quality. Data cleansing is becoming an important topic in data warehouse and data mining, as well as web data processing fields.In this paper, we depicted the knowledge of data cleansing in detail. We introduced the concept, meaning and current research and application situation home and abroad of data cleansing. We summarized and described the theories, methods, evaluating standards and basic workflow of data cleansing. We introduced the knowledge of domain ontology and the Web Ontology Language (OWL). Especially our researching emphasis is on the researching of data cleansing based on domain ontology.By analyzing the limitation of traditional structures of knowledge base, an extended tree-like knowledge base is built by decomposing and recomposing the domain knowledge in this paper. The leaf node of the tree is linked with the knowledge instance called atomic knowledge and the non-leaf node is linked with the concept of knowledge. Based on the knowledge base, a data cleaning algorithm is proposed. It extracts atomic knowledge of the selected nodes firstly, then analyzes their relations, deletes the same objects, builds an atomic knowledge sequence based on weights, lastly cleans data according to the sequence. The experiment showed that the count of scanning mass data can be reduced rapidly by adopting the algorithm to optimize the users' requests and the data cleaning efficiency can be improved clearly.

Keywords/Search Tags:

Data Cleansing, Domain Ontology, Field Cleansing, Duplicate Cleansing, Knowledge Base

PDF Full Text Request

Related items

1	Data Cleaning Algorithm And Applications
2	Study And Application Of The Data Cleansing Techenology In ETL
3	Analysis And Design Of Domain Based Chinese Data Cleansing System
4	Research And Implementation Of Data Cleansing Framework Based On Component
5	The Research Of Data Cleansing With XML
6	Research And Application Of Data Cleansing In Multi-radar Data Fusion Algorithm
7	Data Bryte: A standards/model-based data cleansing framework
8	Study And Implementation Of A Data Cleansing System Based On Multi-Agent Technology
9	Study And Implementation Of A Data Cleansing System Based On Multi-agent Technology
10	Duplicates Cleansing Based On Semantic Association