Font Size: a A A

Research On Cleaning Method Of Big Data For Reservoir Health Monitoring

Posted on:2020-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:W H LiFull Text:PDF
GTID:2392330575964156Subject:Architecture and civil engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid promotion of big data mining technology in the water conservancy industry and the concept of reservoir health management,the reservoir health management platform has accumulated a large number of reservoir health monitoring data.The massive data is to establish a reservoir comprehensive health diagnosis model and realize the reservoir.An important basis for health warnings,optimization of dispatching,etc.In the long-term data collection and storage process,the reservoir health management platform caused the missing values and outliers of the reservoir health monitoring data due to network fluctuations,sensor failures,human error,etc.These missing values and outliers severely reduced the monitoring data authenticity.It not only affects the construction of the reservoir intelligent comprehensive health diagnosis model but also interferes with the accurate prediction and warning of the reservoir health status,and even provides misleading information,which leads to the reservoir manager's decision-making mistakes and irreparable consequences.Therefore,it is of great engineering significance to clean the massive reservoir health monitoring data with large quantity,complex data,high speed,low density value and authentic big data characteristics.In this paper,MATLAB software is used to program K-means algorithm based on multiple distances and K nearest neighbor algorithm based on multiple distances,box plot method,outlier detection method based on Mahalanobis distance,and outlier detection method based on Euclidean distance.In the program of k-means algorithm and K-nearest algorithm,Euclidean distance,Manhattan distance,Minkowski distance,Chebyshev distance,cosine distance,Correlation distance,Spearman Correlation Coefficient,Hamming distance,and Jachard distance are introduced as metrics.Unit,in the process of distance-based outlier detection method,introduces Euclidean distance and Mahalanobis distance as the unit of measurement,and uses the programmed procedure to perform missing value filling and outlier detection on reservoir health monitoring data,and use the missing value to fill the effect.The square root error is quantitatively analyzed,and the abnormal value detection results are summarized and analyzed.The main conclusions are as follows:(1)The K-nearest neighbor algorithm based on multiple distances is used to fill the missing data of reservoir health monitoring.The analysis and comparison show that the K-nearest neighbor algorithm based on Manhattan distance has the best filling effect.After many operations,the mean square error between filling value and missing value is about 3.507.(2)K-means algorithm based on multiple distances is used to fill the missing data of reservoir health monitoring.The analysis and comparison show that K-means algorithm based on Spielman Correlation Coefficient has the best filling effect.After many operations,the mean square error between filling value and missing value is about 0.155.(3)K-means algorithm based on Spielman correlation coefficient is used to fill the missing values for the monitoring data with high discreteness and low discreteness.Means are used to fill the missing values for the monitoring data with little change.(4)For outlier detection beyond the scoring range,the outlier detection method based on Mahalanobis distance has better effect and higher detection accuracy;for outliers with poor scoring,the outlier detection method based on Euclidean distance has better effect and wider detection range.(5)The box chart method is used to detect the abnormal value of the large data of reservoir health monitoring.The results show that the box chart method can directly express the shape information of the median,tail length,abnormal value and distribution interval of the monitoring data,and is suitable for macroscopic analysis of the abnormal value.(6)The method of box chart and anomaly detection based on Mahalanobis distance are used to detect the anomaly value of large data of reservoir health monitoring.The combination of the two methods effectively expresses and locates the anomaly value of large data of reservoir health monitoring intuitively.The research results show that the above method can effectively fill the missing values of the reservoir health monitoring big data quickly and efficiently for the large-scale data of the reservoir health monitoring,the large number of monitoring indicators,the complex and diverse data types,and the visual expression of the outliers.Accurate positioning can maximize the quality and authenticity of reservoir health monitoring data,thus improving the accuracy of data analysis.It is an important prerequisite for realizing optimal scheduling of reservoir projects and ensuring efficient,safe and healthy operation of reservoirs.
Keywords/Search Tags:Reservoir Health, Big Data, Data Cleaning, Missing Value Filling, Out lier Detection
PDF Full Text Request
Related items