| It is vital to understand the characteristics of soils and their distribution in space and over time.Study soil profile and its different levels of soil information(such as organic carbon)is of great significance for soil science research,such as soil genesis and development,soil classification and so on.The traditional soil profile investigation involves high-density interval sampling and indoor chemical analysis.The process is time-consuming and laborious,and it is difficult to quantify the continuous variation of soil properties in the vertical direction.Imaging spectrum technology has the advantage of "spectrum integration".Based on the quantitative analysis function of visNIR soil spectrum,it realizes the high spatial resolution mapping of soil attributes on the complete soil profile,and overcomes the limitations of traditional soil profile attribute analysis.At present,imaging spectra are mostly used in soil attribute prediction and mapping research,which adopts the strategy of local sample modeling and prediction.The universality and extrapolation effect of the model are still unclear,and solving this problem requires a large number of sample data support.The soil spectral database with rich and diversified information provides a large number of samples with rich variability and diversity for the establishment of local models as the data basis.However,due to the heterogeneity of soil samples in the reservoir and the adaptability of the model,the robustness of regional or local scale models is usually poor.Taking the 1 m deep complete soil profile as the research object,on the premise of partial storage of target samples,based on the dissimilarity of soil spectrum,this study Explore the feasibility of constructing local prediction model to predict soil organic carbon(SOC)in soil profile by distance algorithm combined with soil spectrum database.Based on the soil spectrum library(SSL)composed of 677 soil columns and six target samples from the global soil spectrum library(GSSL),200 representative spectra are extracted from the whole profile based on K-means algorithm to form a local target test set(Test).Euclidean distance(ED),Mahalanobis distances(MD)and Spectral angle mapper(SAM)are used to measure the spectral dissimilarity between test and SSL and generate the distance matrix.According to the first 0.03%,0.1%,0.2%,0.3%,10%and 50%of the distance matrix,the spectral samples most similar to test are extracted from SSL to construct a local modeling set(local)with a total of 6 capacities.The local spectral models of vis-NIR and SOC content are established by partial least squares regression(PLSR)and random forest(RF),and the accuracy of the model is verified by point soil samples,Through the principal component space of spectrum,the"capacity accuracy" changes of local under various distance algorithms are investigated and explained,and a variety of local models are used for the fine characterization of SOC content in soil profile.The main results and conclusions are as follows:(1)Through the calculation of spectral dissimilarity,the accuracy of local SOC prediction using large and complex soil spectral database can be improved.For the six PLSR and RF ocal models established by ED,MD and SAM algorithms,the PLSR modeling accuracy R2 is up to 0.63 and the RF model R2 is up to 0.92.However,there are significant differences in the inflection point of "capacity accuracy" between the three algorithms.ED and SAM algorithms have obvious advantages over MD.local in the first four proportions not only has the best accuracy,but also uses less library samples.(2)The local model is used to predict the SOC content of soil profile.The prediction accuracy of PLSR and RF local models established by ED algorithm is significantly improved when only a few samples are used in the first four proportions(PLSR:R2=0.86~0.88,RMSE=1.07%~1.37%;RF:R2=0.65~0.67,RMSE=0.77%~0.86%);The prediction accuracy of PLSR and RF local models established by MD algorithm is slightly better than that of the whole database in the latter two proportions;The prediction accuracy of PLSR and RF local models established by Sam algorithm is not only higher in the first four proportions(PLSR:R2=0.81~0.9,RMSE=1%~1.13%;RF:R2=0.74~0.8,RMSE=0.85%~0.95%),but also uses the least number of samples.(3)A variety of PLSR and RF local models are used to draw the SOC content distribution map of soil profile.It is found that the PLSR models of ED and SAM and the PLSR and RF models of MD can not estimate the SOC content distribution of soil profile.The RF models of the first four proportions of ED and Sam can use local with reasonable accuracy to realize the fine characterization of soil profile SOC content,but the profile SOC content has the phenomenon of high value underestimation and low value overestimation.Compared with SAM,ED takes into account the waveform and amplitude characteristics of the spectrum,which is more advantageous than ed.when using 9%of the library samples,not only the prediction accuracy is the best(R2=0.79,RMSE=0.85%),but also the library samples are the least used.To sum up,the modeling accuracy of RF local model established by the three algorithms is better than that of PLSR,and there are significant differences among the three algorithms at the inflection point of "capacity accuracy".SAM,which takes into account the waveform and amplitude of spectrum and ED algorithm for calculating European spatial distance,is more suitable for constructing local model from spectrum library to improve the modeling accuracy of global model than MD algorithm for calculating covariance distance;The three algorithms PLSR model cannot predict the SOC content of soil profile.Although the accuracy of PLSR is high,the effect is poor when it is used to estimate the SOC content distribution of complete soil profile.We can know that the accuracy of the model is not the only standard to evaluate the quality of the model;MD algorithm,PLSR and RF local model cannot estimate the distribution of SOC content in soil profile;Considering comprehensively,SAM algorithm is more suitable for constructing local model from soil spectrum library to predict soil profile SOC content,and 9%samples of the library can be used as the capacity reference of local model. |