Font Size: a A A

Research On Data Quality Inspection Rule Extraction Technology

Posted on:2018-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:X S WangFull Text:PDF
GTID:2359330512997504Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology,the data has penetrated into all walks of life and all aspects of production and management.The scale of data is bigger and bigger.However,the phenomenon of "rich data,poor information" becomes more and more obvious.The main reasons for it: on the one hand,there is the lack of integration and analysis of data,on the other hand,with the emergence of dirty data,the quality of data is seriously affected,leading to the use of existing data in different industries cannot be effective.Data quality is the base of analysis,data mining and decision making.The improvement of the quality not only can accurately reflect the condition of the real world but also can effectively support the operation and decision-making of the enterprise,so the data quality has become a hot issue in the field of data managementNowadays the management of data quality mainly uses inspection rules of the data quality to judge the legitimacy of the data,to estimate the grade of the data quality.Inspection rules of data quality are closely related with business,it is made manually by experts of the field and experts of data management.The rules making are in large quantity,low efficiency,time-consuming,difficult to guarantee the integrity of the rules.This paper uses the thought of "reverse engineering" in software engineering,with the help of machine learning techniques,to research the key technology of the automatic generation of inspection rules of data quality in order to provide more options for experts in the field,improving the efficiency of making inspection rules of the quality of data.In order to comprehensively inspect the existing problems of the data quality,this paper studies the data quality evaluation criteria,and from the perspective of rules restricting,to study text formats codomain and functional dependency of the database,to design the extraction process of the three kinds of data quality rules learning algorithm,and a general,domain independent extraction system of data quality inspection rule is developed.
Keywords/Search Tags:data quality, rule extraction, function dependency, text rule
PDF Full Text Request
Related items