| With the rapid development of information technology, organizational managers depend on data more and more when making their decisions. On the foundation of database there appears data warehouse which can support decision analysis. But during the construction of data warehouse, data from different data sources are inputted into the data warehouse, there may exist many data qualitative problems, result in false decisive analysis and influent quality of information service. There is a strong need to carry out a data cleansing process to improve the data quality. Data cleansing is becoming an important topic in data warehouse and data mining, as well as web data processing fields.In this paper, we depicted the knowledge of data cleansing in detail. We introduced the concept, meaning and current research and application situation home and abroad of data cleansing. We summarized and described the theories, methods, evaluating standards and basic workflow of data cleansing. We introduced the knowledge of domain ontology and the Web Ontology Language (OWL). Especially our researching emphasis is on the researching of data cleansing based on domain ontology.By analyzing the limitation of traditional structures of knowledge base, an extended tree-like knowledge base is built by decomposing and recomposing the domain knowledge in this paper. The leaf node of the tree is linked with the knowledge instance called atomic knowledge and the non-leaf node is linked with the concept of knowledge. Based on the knowledge base, a data cleaning algorithm is proposed. It extracts atomic knowledge of the selected nodes firstly, then analyzes their relations, deletes the same objects, builds an atomic knowledge sequence based on weights, lastly cleans data according to the sequence. The experiment showed that the count of scanning mass data can be reduced rapidly by adopting the algorithm to optimize the users' requests and the data cleaning efficiency can be improved clearly. |