| The issues of agriculture,rural areas,and farmers have always been the focus of the Chinese government’s efforts to improve people’s livelihoods.Properly addressing the issues of agriculture,rural areas,and farmers can not only improve the quality of life of farmers but also promote the modernization of our country.For farmers,grassroots agricultural and commercial loans play an irreplaceable role in resolving the issues of agriculture,rural areas,and farmers and promoting the development of agricultural economy.Firstly,rural population is relatively large,and the financial credit awareness system in rural areas is generally weak,which restricts farmer’s access to credit.Secondly,the rise in the price of various agricultural inputs in 2021 has had a significant impact on farmers’ livelihood and wealth creation as the price of grain is not proportional to the farmers’ investment.Therefore,helping farmers to obtain loans and evaluating farmers’ creditworthiness for agricultural credit institutions to avoid risks have become important issues that cannot be avoided.This article first elaborates on the concepts of farmers and farmers’ credit,clarifies the object to be studied,and explains the research significance.The article also provides an overview of basic machine learning knowledge;furthermore,it enhances the understanding of data through data cleaning and exploratory data analysis.This article also utilizes the random forest algorithm to fill in missing values and innovatively combines two different SMOTE algorithms to establish a more effective machine learning model.In the third step,after Bayesian parameter tuning,the optimal machine learning model is obtained,and appropriate indicators are selected to evaluate and judge the model.Finally,according to the existing scoring mechanism,a scoring card model is established,and the machine makes scientific judgments about whether farmers are likely to default on loans.Based on the practical consideration that it may not be possible to select sufficient features,the article chooses the stepwise feature regression method to select features,and the model’s fitting effect is good when there are ten features.This article selects the most suitable machine learning model through various machine learning algorithms,and uses two SMOTE methods to handle imbalanced datasets.After preliminary modeling and parameter tuning,it is concluded that the Xgboost model is the best machine learning model,with an AUC value of 0.867 and an accuracy rate of 0.931.Compared with the original model’s effect,the accuracy has improved,and the PSI,KS,and other indicators show that it has a good classification effect on different farmer groups.Although the support vector machine model’s effect is also good,with the best AUC of 0.861,it takes a long time to run,so it is not selected for application.Additionally,the logistic regression model has the highest recall rate in the testing sample at 0.842,but its prediction accuracy is only 0.648.The random forest’s performance on recall rate and precision is comparatively poor,possibly because the model’s ability to handle data outside the training set is weak.Based on the results,we chose the Xgboost model to construct a pre-loan application model,providing credit institutions with reference opinions for different borrowing groups when classifying pre-loan models.Finally,based on the established model’s scoring mechanism,this article proposes suggestions from the perspective of both farmers and lending institutions to strengthen and improve the farmers’ credit model,promote the smooth cooperation between credit institutions and farmers in the era of big data: 1.Farmers need to strengthen their credit awareness to avoid loan rejections.2.We need to pay more attention to rural credit evaluation under the context of big data,which can help to build machine learning models and enhance their effectiveness.3.The government should actively participate in farmers’ credit building and publicity work,encourage and create a good credit atmosphere. |