Soil organic matter is an important part of soil fertility,and soil fertility is an important factor to ensure the growth of crops.Accurate and rapid acquisition of the spatial distribution characteristics of soil organic matter is important for soil fertility evaluation,soil carbon pool,precision agriculture,and efficient and sustainable utilization of soil resources.meaning.After entering the era of precision agriculture,how to predict the spatial distribution of soil organic matter content accurately and quickly has become one of the hot topics in academic research.Based on the Jupyter notebook integrated development environment,Google Earth Engine cloud platform,ENVI 5.3 and Arc GIS 10.6 software platforms,this study processes Landsat-8 remote sensing image data and SRTM digital elevation model,and extracts the synthetic annual maximum normalized vegetation index and synthetic annual maximum ratio.Vegetation index,composite annual maximum difference vegetation index,composite annual maximum normalized water body index,elevation,slope and other indices,as well as soil type data,planting system data and spatial distance data,a total of12 environmental factors,combined with Gaofen-6(GF-6)A machine learning regression prediction model was established for the reflectivity of four bands of remote sensing image data.In this paper,four machine learning models,Random Forest(RF),Gradient Boosting Tree(GBDT),Lightweight Gradient Boosting Machine(Light GBM)and Extreme Gradient Boosting Machine(XGBoost),are used to compare the training efficiency and prediction performance of the four models.The following conclusions are drawn:(1)By analyzing the modeling results on the data set,it can be seen that the performance of different machine learning models on different data sets is quite different.It is more beneficial to select models according to the characteristics of the data sets.The prediction effect based on the rice dataset is better than other datasets.(2)After the feature set of the dataset undergoes feature engineering,the training efficiency and predictability of the model can be improved.(3)By comparing the modeling results of each dataset,the XGBoost model based on the paddy soil dataset has the best modeling effect,and the R2,RMSE and MAE of the XGBoost model based on feature screening are 0.799,1.660 and1.391,respectively.(4)The training efficiency of XGBoost and Light GBM models is better than that of RF and GBDT models.(5)Machine learning models can be used for rapid prediction and inversion of large-scale soil organic matter content.Finally,based on the integrated development environment,the Python language was used to write a soil organic matter spatial prediction model,and the machine learning model with the best performance in each data set was selected to invert the spatial distribution of soil organic matter in the study area. |