| Big data technology has developed rapidly in recent years,and the medical field is gradually moving towards informatization.The integration of the two has become a current research hotspot.Data in the medical field has valuable research value,and has great potential in the prediction and prognosis of diseases.However,data sets in the medical field have a small sample size problem.The amount of data is not enough to train traditional deep learning models.Therefore,it is necessary to study the small sample problem in the field of medical big data.This research has great value and realistic significance.This research proposes a problem feature space search method based on medical experience for small sample data sets in the medical field.By selecting different feature subsets as analysis variables and target variables,combined with the experience of professional physicians,we can find medical problems with research value.In addition,a small sample prediction interpretation model based on SHAP feature selection is proposed.First,the model is preprocessed,and the random forest model and the SHAP method are used to obtain candidate feature subsets.The XGBoost model compares the performance of the candidate subsets and selects the best The feature subset,combined with the judgment of professional physicians,is used to interpret the feature set for small sample problems.Finally,use this feature set to train the model,and analyze the impact of features on the model through SHAP.Through the problem feature space search method based on medical experience,we find medical problems with research value,and conduct experiments on the question of "the relationship between the CT data of patients on admission and discharge and the residual lesions three months after discharge".The candidate feature subset generated by SHAP was screened by professional physicians,and 26 indicators were obtained for subsequent experiments.The XGBoost model and the Light GBM model are trained,and the AUC reaches about 0.9,indicating that the effect of feature selection is better.Finally,the SHAP analysis method is used to dig out the medical indicators that affect the results of the model,such as the real lesion volume,the proportion of the functional lung tissue volume in the lesion,etc.,which can help medical staff to carry out follow-up scientific research. |