Font Size: a A A

Research On The Establishment And Application Of A Big Data Mining Framework For Geological Disasters Based On Hadoop

Posted on:2020-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:W H GuoFull Text:PDF
GTID:2430330596997367Subject:Surveying and mapping engineering
Abstract/Summary:PDF Full Text Request
With the emphasis on geological disasters by humans,the means of geological disaster monitoring are more diversified.The long-term monitoring of geological disasters has brought massive monitoring data,how to analyze large-scale geological hazard data in a timely and effective manner,and thus the medium-and long-term warning to geological disasters has become a hot topic of research.Applying big data technology to geological disaster data mining research to achieve medium and longterm early warning and monitoring of geological disasters has become a consensus.In the medium and long-term early warning research of landslides,the operational efficiency of predictive models plays an important role in the timeliness of landslide prevention and control.The key role,the study found that there are two main problems in the medium and long-term warning of landslides:(1)in the context of big data,when faced with large-scale data reading and writing,the throughput performance of the database is low;(2)based on distribution The landslide spatiotemporal prediction model does not have the corresponding algorithm optimization based on the data type and characteristics.According to the problem of discovery,optimizing the storage model and geological disaster prediction algorithm model under big data technology is the key research content of this paper.Using the optimized scheme to establish a Hadoop-based geological disaster big data mining framework,and applying the mining framework.Feasibility verification.The establishment of a Hadoop-based geological disaster big data mining framework includes: establishing a distributed computing environment,optimizing the data storage model,and optimizing the geological disaster prediction model.The relevant content and main conclusions are as follows:(1)After the establishment of distributed computing environment,in order to make the geological disaster mining framework more practical,it is necessary to optimize the data storage model.Optimizing the data storage model is the key to the feasibility of the geological disaster big data mining framework.In this paper,MongoDB is used as the database of landslide hazard data.The data throughput performance difference between Mongo-Hadoop and MongoDB MapReduce is analyzed and compared.It is found that Mongo-Hadoop performs better in data throughput than the built-in MongoDB MapReduce.Mongo-Hadoop is selected as the technical support of the database,but the default database partition size is not applicable to Mongo-Hadoop.In order to further optimize the data storage,further research on data segmentation is made,and it is found that when the amount of data is within a certain range,the data is segmented.At 100 MB or more,the performance of the database has been greatly improved.(2)Geological disaster prediction model optimization is the key work content of the geological disaster big data mining framework.In the medium and long-term early warning research of landslide,the computational efficiency of the prediction model is crucial for disaster prevention and control.This paper takes the most widely used Apriori algorithm in spatial data mining as an example.The original Apriori algorithm obtains the correlation rules of strong correlation by scanning the data layer by layer.The algorithm generates a large number of frequent itemsets.When the data volume is too large The pressure on the hardware will be greater.The Maporiuce-based Apriori algorithm is optimized mainly by the computational model and is not optimized from the algorithm level.Starting from the bottom of the algorithm,this paper proposes an IAprioriMR algorithm based on MapReduce framework,using Webdoc dataset as experimental data to verify the computational efficiency of IAprioriMR algorithm.Through comparative analysis,it is found that when the amount of instances of experimental data exceeds 320,000,the optimized Compared with the traditional parallel AprioriMR algorithm,the IAprioriMR algorithm model has a significant improvement in efficiency.At the same time,with the increase of nodes in the MapReduce environment,the optimized IAprioriMR algorithm is more significant in performance improvement.(3)Using the landslide hazard data in the Sanjiang parallel area to verify whether the optimized geological disaster big data mining framework is feasible.The landslide monitoring data set from 2000 to 2011 was selected as the training model,and the landslide monitoring data set from 2012 to 2013 was selected as the test data.According to the prediction model rules of IAprioriMR,the groundwater level,rainfall,water level of the parallel rivers in the Three Rivers,and cumulative displacement of the landslide monitoring points are set as the inducing factors of landslide occurrence,and the landslide occurs as a result.Between 2012 and 2013,a total of 21 landslides were recorded in the parallel area of Sanjiang.By calculating the rule of confidence greater than 0.7,the mining framework successfully judged 16 landslide accidents between 2012 and 2013.The accuracy of this experiment was 76.2%.The excavation framework established in this paper has certain feasibility for medium and long-term early warning research on geological disasters.
Keywords/Search Tags:Association rules, Distributed computing, Landslide, Geological hazard prediction model
PDF Full Text Request
Related items