| Background: Artificial intelligence(AI)is a branch of applied computer science in which people train computer algorithms to perform tasks related to human intelligence(HI),such as visual perception,speech recognition,decision support,reasoning analysis,prediction judgment,etc.With the advent of the information age,how to use AI to process massive amounts of information and obtain effective medical data and treatment plans has become a hotspot of current research.Machine learning(ML),as a main aspect of AI,has played a huge advantage in the medical field.It can allow people to obtain high-quality medical services more efficiently and conveniently on the basis of saving costs.Osteoporosis(OP)is a disease closely related to ageing and unhealthy lifestyles,and tends to occur in postmenopausal women and middle-aged and elderly men.With the aging of the population and the continuous changes in lifestyle,the incidence of OP has risen sharply,which has become an important public health and economic problem in our country.The occurrence and development of OP is usually silent,and it is often not noticed until serious complications such as fractures and spinal deformities occur,so it is also called "invisible epidemic".Unfortunately,the mechanism of bone loss in the Chinese population is not yet clear,and the existing OP risk assessment tools still have great limitations.In addition,OP is greatly influenced by region and ethnicity,and there is still a lack of targeted localized OP prediction tools.Objective: To analyze the general situation of bone metabolism in Chinese community residents based on the two-center cross-sectional survey data.The least absolute shrinkage and selection operator(LASSO)regression analysis was used to explore the risk factors of OP after 3 years in middle-aged and elderly male subjects(age≥40 years old)and postmenopausal female subjects.The OP risk prediction model was constructed using 7 novel ML algorithms,and the ML algorithm with the best predictive performance was screened for comparison with traditional regression analysis.The application value of the two methods in predicting the risk of chronic diseases with complex variables,few outcome events was analyzed.Finally,the optimal ML algorithm was used to build a network visualization-based OP risk calculation model to help clinicians and patients in remote rural areas identify and intervene early risk factors before OP occurs.Methods: In the Part Ⅰ: Based on the data of the two-center cross-sectional survey,to explore the relationship between endocrine and metabolism-related indicators such as body mass index(BMI),metabolic syndrome(MS)and OP,and to analyze the basic health status of bone metabolism in the Chinese population.Confounding was controlled using univariate and multivariate logistic regression(LR)analysis and stratified analysis,respectively;In the Part Ⅱ: Based on single-center retrospective cohort study data,LASSO regression analysis was used to explore the risk factors of OP after 3 years in middle-aged and elderly male and postmenopausal female subjects,and the risk factors were visualized in the form of nomogram.The nomogram model was evaluated using area under receiver operating characteristic curve(AUROC),calibration curves,clinical decision curve analysis(DCA),and C-index to assess the predictive value of these risk factors;In the Part Ⅲ: The feature variables screened out by LASSO regression analysis in the second part were used as the modeling variables in this part.Building OP risk prediction models using 7 novel ML algorithms,including extreme gradient boosting(XGBoost),light gradient boosting machine(Light GBM),adaptive boosting(Ada Boost),random forest(RF),support vector machine(SVM),multilayer perceptron(MLP),and k-nearest neighbors(KNN).Screening of the best ML model by comparing the area under receiver operating characteristic curve(AUROC)values and optimization of the best model using 10-fold cross-validation;In the Part Ⅳ: The differences in variable selection between LASSO regression analysis and traditional regression analysis were compared.By controlling variables,the prediction performance of the best ML algorithm and the traditional binary LR method in constructing the OP occurrence risk model was analyzed.Finally,the ML algorithm and shapley additive explanations(SHAP)were combined to construct an online calculation model of OP based on network visualization.Results:1.Based on the two-center cross-sectional survey database,a total of 9429 subjects were finally included in the data analysis,including 4154 male subjects and 5275 female subjects.Among male subjects,1082(26.05%)had osteopenia and 168(4.04%)had OP;among female subjects,1178(22.33%)had osteopenia and 251(4.76%)had OP.Univariate and multivariate LR analysis showed that age and body mass index(BMI)were independent risk factors for OP.To further control for confounding,we grouped BMI and adjusted variables such as age,blood lipids,blood glucose,blood pressure,diet,and exercise patterns through stratified analysis.The results showed that among female subjects in the age group of 50(45-57)years,the risk of OP was significantly increased in patients with weight loss(BMI<18.5kg/㎡),all P<0.05;Among male subjects aged 53(46-62)years,overweight(BMI: 24-28kg/㎡)patients had a significantly lower risk of developing OP,all P<0.05.When exploring the relationship between MS and its components and OP,we found that male patients with waist circumference≥100cm had an increased risk of developing OP,and there was no significant correlation between the other components such as hyperglycemia and hypertension and OP.Similarly,in female subjects,after adjusting for variables and controlling for confounding,no significant relationship was found between MS and its components and OP.2.Based on a single-center retrospective cohort study database,follow-up was approximately 3 years.A total of 3037 baseline non-OP subjects were finally included in the data analysis,including 1834 middle-aged and elderly male subjects and 1203 postmenopausal female subjects.Among middle-aged and elderly male subjects,10 characteristic variables with non-zero coefficients were screened from 44 variables by LASSO regression analysis,including age,neck circumference,waist-to-height ratio,BMI,triglyceride,impaired fasting glucose(IFG),dyslipidemia,osteopenia,smoking history,and high-intensity physical exercise,the AUROC value of the nomogram model for these factors was 0.882(95% CI,0.858-0.907).In postmenopausal female subjects,a total of 6characteristic variables with non-zero coefficients were screened from 49 variables by LASSO regression analysis,including pulse,hip circumference,BMI,impaired glucose tolerance(IGT),osteopenia,and number of deliveries.The AUROC value of the nomogram model built by these factors were 0.852(95%CI,0.815-0.889).The results of DCA showed that when the threshold probability was between 1% and 100%,using the nomogram to predict the risk of OP after 3 years in middle-aged and elderly male and postmenopausal female subjects was helpful in clinical work.3.In this study,we used 7 popular ML algorithms to construct an OP risk prediction model for middle-aged and elderly men and postmenopausal women.The study showed that the XGBoost model had the largest AUROC value in both the training set and the validation set,was the best prediction model.We optimized the XGBoost model through10-fold cross-validation.Among middle-aged and elderly male subjects,the AUROC value of the optimal XGBoost model was 0.885(95%CI: 0.850-0.920)in the training set,0.858(95%CI: 0.749-0.967)in the validation set,and 0.861(95%CI: 0.818-0.903)in the test set;in postmenopausal female subjects,the AUROC value of the optimal XGBoost model was0.865(95% CI: 0.816-0.915)in the training set,0.846(95%CI: 0.701-0.982)in the validation set,and 0.823(95% CI: 0.758-0.887)in the test set.4.Based on the analysis results of Parts II and III,we chose the XGBoost model as the best ML model and compared the optimized XGBoost model with the traditional regression model.Also based on the study population in Part II,we used traditional univariate and multivariate LR analysis for feature variable screening.The screened variables were modeled using regression analysis and visualized as a nomogram.Among middle-aged and elderly male subjects,a total of 3 characteristic variables were screened,including osteopenia,smoking history and high-intensity physical exercise.The AUROC value of the nomogram model constructed by these variables was 0.857(95% CI:0.829-0.886),the internal verification C-index was 0.856.In postmenopausal female subjects,a total of 6 characteristic variables were screened,including BMI,IGT,obesity,osteopenia,age at menarche,and number of deliveries.The AUROC value of the nomogram model constructed by these variables was 0.860(95% CI: 0.826-0.895),the internal verification C-index was 0.848.In terms of variable selection,LASSO regression analysis can identify variables that cannot be identified by traditional LR analysis;in terms of model performance,by unifying variables,ML has the same good predictive performance as traditional regression analysis.Finally,we combined LASSO regression analysis,XGBoost algorithm and SHAP method to construct a network visualization online calculation model of the risk of OP in middle-aged and elderly men and postmenopausal women,respectively.By inputting variable parameters,the model can calculate the risk probability of OP after 3 years for the applicable population from time to time and compare it with the incidence threshold,so as to propose personalized treatment opinions.Conclusion:1.Studies had shown that BMI was closely related to the occurrence of OP.For middle-aged and elderly people,maintaining a certain body mass and performing resistance exercise appropriately can reduce bone loss and prevent the occurrence of OP.2.The research showed that the OP risk factors screened by LASSO regression analysis had strong clinical application value.In clinical work,LASSO regression analysis can be a good variable selection method for studies with small and medium sample sizes with complex variables.3.In the construction of the OP risk prediction model,compared with other commonly used ML algorithms(Light GBM,RF,Ada Boost,SVM,MLP,and KNN),the XGBoost model has better clinical prediction performance in both training and validation sets.As a well-performing ML algorithm,the XGBoost model can provide methodological guidance for the subsequent construction of other chronic disease models.4.The application of ML algorithm to predict the risk of chronic disease was comparable to traditional regression analysis.ML also has good application value in data types with small and medium samples,multivariate complexity,and few positive events.The network online computing model constructed by ML can predict the risk of OP occurrence in patients from time to time,and has the characteristics of convenience,economy and practicability.This study provides methodological support and practical guidance for the health management of bone metabolism and early screening of OP in middle-aged and elderly men and postmenopausal women.We believe that the application of ML algorithms can help primary caregivers to identify the risk of OP at an early stage,improve the screening and treatment of OP,and provide reference and guidance for the management of other chronic diseases. |