Font Size: a A A

The Design And Implement Of Agriculture Big Data Mining System Based On Spark

Posted on:2019-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:E X GuoFull Text:PDF
GTID:2333330548953312Subject:Master of Agricultural Extension
Abstract/Summary:PDF Full Text Request
With the development of information technology,there have a large amount of data from all walks of life.The emergence of data has brought a new type of technology innovation,and we have entered the era of the big data from the era of internet.Agriculture is the first industry of our country.with the improvement of infrastructure,and the development of sensor technology,network technology,remote sensing technology and so on,there are a large amount of data for each link of agricultural resources,agricultural production,agricultural market and agricultural management etc.How to use these data effectively and find valuable information to better serve the agriculture has become a frontier topic.Our country agriculture have the characteristic of involveing many areas,having complex structure and influenced by various factors.Agricultural data storage medium is diversity,data structure is complex,and data dimension is high.At the same time agricultural data is time-effect and difficult to analysis.In this paper,by analysising mature big data technology at present,We select Hadoop distributed file system to solve the problem of storing massive heterogeneous agricultural data,and select the Spark computational framework based on memory to process real-time agricultural data.In addition,agricultural data contains a wealth of information,and digging up these information is of great significance for agricultural development.The clustering method is a common method in data mining,this paper selects spectral clustering algorithm for its excellent performance in data mining to dig the information hidden in the agricultural data,to discover the law,and to provide decision support for agriculture and guidance for related personnel.This paper studies the system's requirement analysis on the agricultural data and designs the mining system storage and analysis of massive agricultural information based on Spark.The system uses three layer architecture,the underlying is data layer,mainly responsible for data collection,data distributed storage.The second layer is business layer,providing computing framework and logic processing,all kinds of integrated mining algorithm in the system achieved at the business layer.The top is interaction layer,to achieve the interaction between the system and users.According to the design scheme of agricultural data mining system based on Spark,the system realized the functions of storage,computing,analysis and mining of agricultural big data.The system builds a distributed cluster of HDFS and a parallel computing cluster of Spark.The system realized the function of each module by the relevant components of the Spark ecosystem.The query and operation of agricultural data module is realized using the Spark SQL module,and the parallel spectral clustering algorithm module is realized by using the GraphX component to analyze agricultural big data.The final is the test of the system.The paper obtains and analyzes soil fertility data provided by the China Soil Database,and the results show that the spectral clustering algorithm has important practical significance for agricultural data analysis,and the large data storage framework system of distributed and parallel have greatly improved the computing performance of data mining algorithm.So the design and development of agricultural data mining system based on the big data technology have an important practical significance to promote the development of agricultural information.
Keywords/Search Tags:agriculture big data, HDFS, Spark, spectral clustering, data mining
PDF Full Text Request
Related items