Research On Cleaning Method Of Big Data For Reservoir Health Monitoring

Posted on:2020-05-27

Degree:Master

Type:Thesis

Country:China

Candidate:W H Li

Full Text:PDF

GTID:2392330575964156

Subject:Architecture and civil engineering

Abstract/Summary:

In recent years,with the rapid promotion of big data mining technology in the water conservancy industry and the concept of reservoir health management,the reservoir health management platform has accumulated a large number of reservoir health monitoring data.The massive data is to establish a reservoir comprehensive health diagnosis model and realize the reservoir.An important basis for health warnings,optimization of dispatching,etc.In the long-term data collection and storage process,the reservoir health management platform caused the missing values and outliers of the reservoir health monitoring data due to network fluctuations,sensor failures,human error,etc.These missing values and outliers severely reduced the monitoring data authenticity.It not only affects the construction of the reservoir intelligent comprehensive health diagnosis model but also interferes with the accurate prediction and warning of the reservoir health status,and even provides misleading information,which leads to the reservoir manager’s decision-making mistakes and irreparable consequences.Therefore,it is of great engineering significance to clean the massive reservoir health monitoring data with large quantity,complex data,high speed,low density value and authentic big data characteristics.In this paper,MATLAB software is used to program K-means algorithm based on multiple distances and K nearest neighbor algorithm based on multiple distances,box plot method,outlier detection method based on Mahalanobis distance,and outlier detection method based on Euclidean distance.In the program of k-means algorithm and K-nearest algorithm,Euclidean distance,Manhattan distance,Minkowski distance,Chebyshev distance,cosine distance,Correlation distance,Spearman Correlation Coefficient,Hamming distance,and Jachard distance are introduced as metrics.Unit,in the process of distance-based outlier detection method,introduces Euclidean distance and Mahalanobis distance as the unit of measurement,and uses the programmed procedure to perform missing value filling and outlier detection on reservoir health monitoring data,and use the missing value to fill the effect.The square root error is quantitatively analyzed,and the abnormal value detection results are summarized and analyzed.The main conclusions are as follows:(1)The K-nearest neighbor algorithm based on multiple distances is used to fill the missing data of reservoir health monitoring.The analysis and comparison show that the K-nearest neighbor algorithm based on Manhattan distance has the best filling effect.After many operations,the mean square error between filling value and missing value is about 3.507.(2)K-means algorithm based on multiple distances is used to fill the missing data of reservoir health monitoring.The analysis and comparison show that K-means algorithm based on Spielman Correlation Coefficient has the best filling effect.After many operations,the mean square error between filling value and missing value is about 0.155.(3)K-means algorithm based on Spielman correlation coefficient is used to fill the missing values for the monitoring data with high discreteness and low discreteness.Means are used to fill the missing values for the monitoring data with little change.(4)For outlier detection beyond the scoring range,the outlier detection method based on Mahalanobis distance has better effect and higher detection accuracy;for outliers with poor scoring,the outlier detection method based on Euclidean distance has better effect and wider detection range.(5)The box chart method is used to detect the abnormal value of the large data of reservoir health monitoring.The results show that the box chart method can directly express the shape information of the median,tail length,abnormal value and distribution interval of the monitoring data,and is suitable for macroscopic analysis of the abnormal value.(6)The method of box chart and anomaly detection based on Mahalanobis distance are used to detect the anomaly value of large data of reservoir health monitoring.The combination of the two methods effectively expresses and locates the anomaly value of large data of reservoir health monitoring intuitively.The research results show that the above method can effectively fill the missing values of the reservoir health monitoring big data quickly and efficiently for the large-scale data of the reservoir health monitoring,the large number of monitoring indicators,the complex and diverse data types,and the visual expression of the outliers.Accurate positioning can maximize the quality and authenticity of reservoir health monitoring data,thus improving the accuracy of data analysis.It is an important prerequisite for realizing optimal scheduling of reservoir projects and ensuring efficient,safe and healthy operation of reservoirs.

Keywords/Search Tags:

Reservoir Health, Big Data, Data Cleaning, Missing Value Filling, Out lier Detection

Related items

1	Research On Intelligent Water Data Big Data Cleaning Algorithm Based On Stereo Sensing
2	Research On Data Cleaning Method Of Cell Intelligent Manufacturing
3	Research On Missing Data Filling Algorithm For Automatic Monitoring Data Of Slope
4	Research On Data Cleaning And Repair Methods For Vessel Status Data
5	Design Of Qingdao Jiaodong International Airport Health Monitoring System And Numerical Simulation Analysis Of Data Missing Repair
6	Bridge Health Monitoring Data Cleaning Method And System Design
7	Urban Road Traffic Flow Data Cleaning Technology And System Implementation
8	Data-driven Reconstruction Of Missing Data And Power Load Forecasting
9	Research On Cleaning Method Of Power Quality Data Based On Correlation Analysis
10	The Research Of Real-time Data Cleaning In The Ship Monitoring System