Font Size: a A A

Research On Key Problems Of Data Quality In Industrial Big Data Environment

Posted on:2020-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiuFull Text:PDF
GTID:2381330605458525Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Industrial big data is the foundation of intelligent manufacturing.Industrial equipment generates a lot of data in the process of operation,which is the key factor to improve manufacturing productivity.However,there are some data quality problems in the data,and it is likely to produce wrong results by directly applying these data.Therefore,paying attention to the quality of the data itself is an important aspect to improve the quality of industrial big data.Based on the research of traditional data quality problems,this paper studies the quality problems of industrial big data according to the characteristics of industrial big data.Firstly,this paper introduces the research status of industrial big data quality problems,analyzes the data characteristics of industrial big data,and classifies the data quality problems.Secondly,in view of the consistency and timeliness problems in industrial big data,this paper analyzes the causes of data quality problems,and puts forward corresponding solutions.The specific work can be summarized as follows:(1)In view of the problems of time stamp missing and data time inaccuracy in industrial big data,this paper puts forward the methods of time effect rule extraction and data time effect repair,so as to better deal with the time effect problems in industrial data.(2)Aiming at the problem of poor consistency cleaning effect of traditional data cleaning methods,a data cleaning framework based on data consistency is designed,which can automatically and iteratively perform two processes of data detection and data repair,improving the efficiency of data cleaning.(3)Based on the proposed solution to the problem of timeliness and consistency,combined with the characteristics of data quality problems in the steel-making industry,a solution to the problem of data quality in the steel-making industry is proposed and implemented in a distributed environment.Experiments show that the methods proposed in this paper are effective for solving data quality problems.It is of great significance for the research of key data quality problems in the industrial big data environment,for the solution of data quality problems,for the improvement of data quality,and for the decision-making of enterprises.
Keywords/Search Tags:data quality, data consistency, data cleaning, data timeliness
PDF Full Text Request
Related items