Winter wheat yield prediction is an important direction in the agricultural field.With the development of science and technology,the machine learning prediction model based on computer algorithm is becoming more and more prominent.With the increasing number of researchers devoted to this direction,various forecasting indices have gradually come into the sight of researchers.Shandong Province is a large province with agriculture as the leading industry,and its agricultural production and grain output have an important position and influence in the country.In recent years,although the production of winter wheat in Shandong has increased,the sown area has been shrinking rapidly.Therefore,large-scale and reliable winter wheat production forecasts are critical for food trade and policymaking in order to meet the challenges of climate change,population growth,and growing food demand.This subject selects winter wheat in Shandong Province as the research object,uses machine learning algorithms to build a prediction model,and uses multi-modal data such as climate data,soil data,satellite data,and spatiotemporal variables as predictors to carry out prediction research on winter wheat production.The core work of this subject is as follows:(1)In the initial stage of this project,variable selection and preprocessing of winter wheat planting area and yield data,climate data,soil data,and satellite data were completed.Obtain the required data from a variety of sources.Due to the diversification of sources,the data has multi-modal characteristics such as text data,grid data,and hierarchical format data.The first is the unification of format,merging multimodal data into data in comma-separated value format.The second is spatial unification.Multimodal data includes county-level data,kilometer-level data,and longitude-latitude grid range data.In order to maintain data efficiency,it is unified into county-level precision data.The last is the unification of time,which unifies all data to monthly precision.So far,a total of 74 variables have been summarized.(2)Based on the minimum absolute value convergence and selection operator algorithm(LASSO),ridge regression(RIDGE),support vector machine regression(SVR),random forest(RF),extreme gradient boosting algorithm(XGBoost),light gradient boosting algorithm(Light GBM),etc.6 machine learning algorithms,6 winter wheat yield forecasts were constructed,and their performance was studied.Then,according to the time division method of this paper and the time division method of predecessors,the data set training prediction model is made respectively,and the prediction model is compared to study the rationality of the time division method.Using the most powerful model to explore the best forecast period of winter wheat yield is from which month to several months.The results show that the model trained on the dataset organized by the proposed time partitioning method outperforms previous studies.Among these prediction models,the prediction accuracy of the XGBoost model is much higher than that of the other five prediction models.The best forecast time period is November to January.Among the 74 variables,the three variables with the highest contribution are all satellite factors,which shows that satellite data has great application prospects in the agricultural field.(3)The paper also explores the predictors.First,it was investigated whether multimodal data had a positive impact on the predictive model.Second,this paper explores the impact of spatiotemporal variables on the forecasting model.Finally,a new vegetation index(EVI2)is introduced to predict yield.EVI2(Enhanced Vegetation Index without Blue Light),a new product from NASA,was investigated to see whether it would improve prediction accuracy more than sunlight-induced chlorophyll fluorescence(SIF).The results showed that adding spatio-temporal variables in the data set would improve the accuracy,and adding soil data and satellite data to the prediction model could also improve the prediction accuracy of winter wheat yield to a certain extent.Compared with SIF,EVI2 has a stronger effect,but both can bring greater increase in prediction accuracy. |