| Spatial prediction is a process of using connection function to predict the values of unknown locations according to observation locations in the field of Geosciences.Regression kriging(RK)and machine learning(ML)algorithms,such as random forest(RF),have been widely used in soil water quality inversion,air quality prediction and other fields.Recently,RF prediction model that takes into account spatial autocorrelation has been gradually developed.They introduce buffer distance,observed value of nearest points and horizontal distance from nearest points to predicted points.Their effectiveness has been confirmed.However,there are also shortcomings.They are the distance introduced that leads to biased results in spatial prediction and underfitting during the processing of small sample data sets.Therefore,this paper mainly studies these two shortcomings,and the main work is summarized as follows:(i)In order to overcome the shortcomings of distance consideration in traditional RF spatial prediction models,this paper introduced inverse distance weighted strategy based on the existing models.In order to obtain more accurate results,the Random Forest with Inverse Distance Weighted(RFIDW)model was proposed by combining the observed values of nearest points with the distance from neareat points to predicted points.We compared RFIDW with traditional RF spatial prediction models in Spatial Interpolation Comparison 1997(SIC97)data set of precipitation.As expected,RFIDW can obtain more accurate spatial predictions than RFSI and RFsp in this study.And in uncertainty expression information,RFIDW can output effective information.(ii)Aiming at the problem of underfitting of traditional RF spatial prediction model in small sample data sets,Random Forest Spatial prediction model with Modified Upsampling(RFSI-MUS)is proposed by introducing data enhancement strategy.Its characteristics are mainly reflected in the following aspects.Firstly,the stations are clustered according to the similarity of the prediction factor,the target variable and the proximity of the spatial distance.Secondly,the similarity of the prediction factor and the target variable is considered in the process of screening the nearest points.Finally,the extreme value of the predition factor and the target variable of each class after clustering is used in the process of data augmentation.In this paper,the daily precipitation data of Chongqing in January 2018 were used to verify the validity of the model.The experimental results show that the combination of upsampling considering similarity and traditional RF spatial prediction model effectively improves the accuracy of spatial prediction.Through the above studies,this paper demonstrates the superiority of the spatial prediction models considering the proximity and similarity.They solve the problems of insufficient distance consideration and underfitting in small sample data sets in RFSI model.At the same time,they improve the accuracy of the prediction results. |