Font Size: a A A

Research And Implementation Of Water Quality Time Series Data Cleaning Platform Based On Parallel Computation

Posted on:2021-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ChenFull Text:PDF
GTID:2491306470465754Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of intelligent water affair,the quantity of water quality data is also increasing.At the same time,the complex variables and strong correlation of water quality data make it very difficult to deal with dirty data,which seriously affects the decision-making analysis of water industry.Therefore,data cleaning of water quality now is the core of water information research.In recent years,with the wide application of machine learning methods,the technology of using them to solve the problem of data cleaning has made great progress,which provides many available solutions for water quality data cleaning.Based on the data cleaning demand of users and the current distribution characteristics of water quality data,this paper takes operation efficiency of the model as the entry point and combines the time characteristic attributes of water quality data,and then studies the construction method of water quality time series data cleaning platform based on parallel computation,realizes the organic combination of advanced programming technology architecture and data cleaning process,and provides effective data quality assurance for water decision-making analysis.This paper comprehensively introduces the technology related to the construction and application process of water quality data cleaning platform.The main work is as follows:First of all,this paper summarized the water quality time series data cleaning process into three main stages through understanding the basic process and current situation of data cleaning,the characteristics of existing water quality data and the application situation of machine learning methods.It includes three stages: data preprocessing,abnormal values detection and missing values filling.This paper processes water quality data that defy common sense in the data preprocessing stage,processes water quality data that do not conform to normal distribution characteristics of water quality time series in the abnormal values detection stage,this paper predict missing values by statistical methods and machine learning methods in the missing values filling stage.Secondly,this paper uses average,Support Vector Regression(SVR)and Long-Short Term Memory(LSTM)to predict water quality data for different situations of missing water quality data in the missing values filling stage.For the problem that the core parameters of SVR and LSTM are difficult to be determined,this paper uses Particle Swarm Optimization(PSO)to optimize LSTM and SVR then builds PSO-SVR model and PSO-LSTM model,the iterative process of PSO is also optimized by the nonlinear decreasing inertia weight strategy.At the same time,a variety of commonly used algorithm models are used in this paper for comparison experiments.Through comparison of evaluation indexes,the prediction effect of the models used in this paper is better than that of other models,which further illustrates the effectiveness and accuracy of the models used in this paper.Then,in order to solve the problem that the running time of the data prediction model is too long due to the large amount of data and the high complexity of the algorithm,the PSO algorithm is parallelized by using the micro-service architecture as a parallel computing solution.It further improves the efficiency of the entire cleaning process and the ease of use and availability of the platform.By comparing the running time of the prediction model before and after parallel optimization,the efficiency of the parallel algorithm model is verified.Finally,this paper builds on the previous back-end separation Web development technique.Java programming techniques,Python programming techniques,and My SQL database design theory are used to build the back-end services of the platform.Java Script+HTML+CSS technology stack to build the front page of the platform,while combining the water quality time series cleaning process to establish a data cleaning platform.The main functions of each module in the platform and the overall functions of the platform are described in detail.The water quality time series data cleaning platform in this paper relies on the national project of "Major Science and Technology Program for Water Pollution Control and Treatment of China".This study can effectively clean the water quality time series and provide help for the decision-making analysis of water industry.
Keywords/Search Tags:water quality time series, data cleaning, support vector regression, long-short term memory, parallel computing
PDF Full Text Request
Related items