Font Size: a A A

Research And Development Of Agricultural Big Data Processing And Yield Forecast Cloud Platform Fused Spark

Posted on:2020-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2393330575965058Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the popularization and application of networks and Internet of Thing(IOT)technology,the combination of agriculture and computer technology in China is constantly improving.And the data generated in each link is exploding.It has become an important topic in the development of agricultural informatization that seeking effective methods to analyze and process the massive data of agriculture to obtain value information.Chinese agricultural structure is relatively complex.There are many factors affecting crop yields.And the fields involved are also extensive.That makes China urgently need a smart agriculture platform with big data processing and yield forecasting.Based on the background of the Modern Agriculture Demonstration Park in Jizhou District,Ji'an City,Jiangxi Province,this paper studies the high reliability and high efficiency of agricultural big data processing,and constructs an Artificial Neural Networks(ANN)yield forecasting model combined with Full Subset Regression(FSR)feature selection method.This paper also analyzes the actual application requirements in detail,and designs a private cloud platform for agricultural big data processing and yield forecasting fused Spark.Among them,the agricultural big data comes from the massive growth environment data of crops collected by the agricultural demonstration park.The distributed storage service of data is provided by idle computer equipment through system virtualization and configuration of Hadoop clusters.Distributed computing and data management services are provided by Map Reduce,Spark,Hive and so on.Data analysis processing and crop yield prediction services are provided by Spark SQL and Spark MLlib.This paper implements the agricultural big data processing and production forecasting cloud platform fused Spark,including: building a highly reliable fully distributed cluster,and solving the problem that the platform can not work due to the failure of the master node;configuring related mechanisms in Hadoop and Spark to achieve the relevant functions,and performing the SQL type efficient processing in the collected large amount of food environment data,and obtaining the sample data set required for the yield prediction experiment,that is the ten kinds of yield influence factor data sets selected in this paper;implementing the FSR-ANN yield forecasting model for this platform based on Spark MLlib.This paper experiments and uses different indicators to analyze and compare the processing efficiency of the two frameworks and the two prediction effect of the yield prediction models.The experimental results show that for the processing of massive cabbage environment data,the Spark SQL processing efficiency is higher than Hive SQL when configuring different subordinate nodes and inputting different size data files.For the ten kinds of production impact factors of the agricultural park,compared with the ANN model,the FSR-ANN model has a higher correlation coefficient value,a smaller root mean square error value,and a smaller prediction error fluctuation range,so that a better overall prediction effect can be obtained.The cloud platform developed in this paper can meet the realistic needs of agricultural big data processing and yield forecasting,and plays an important role in promoting the development of agricultural information in China.
Keywords/Search Tags:Agricultural big data, Hadoop, Spark, Yield forecast, Fully distributed
PDF Full Text Request
Related items