| Osteoporosis has become one of the most important diseases that threaten the life and health of middle-aged and elderly people,and the number of patients has increased year by year.The most effective way to deal with osteoporosis is early detection,but the shortage of experts and instruments,high inspection costs,and instrument radiation seriously restrict the early diagnosis of osteoporosis.Therefore,it is urgent to build a convenient and accurate risk prediction model to diagnose osteoporosis cost-effectively and early.With the rapid development of artificial intelligence technology,the combination of machine learning and medical treatment has brought new vitality to the field of intelligent medical treatment.Compared with traditional clinical decision making tools,machine learning methods can take into account a variety of variables which are related to osteoporosis and are easy to obtain,so as to identify the risk of osteoporosis more accurately and conveniently.Based on this,this paper proposes an osteoporosis risk prediction model based on machine learning with bone density images and osteoporosis-related health questionnaire data,using semi-supervised machine learning methods for classification and prediction.This model provides convenient and feasible methods for the early diagnosis of osteoporosis,enables high-risk people to accept preventive programs as early as possible,and saves medical resources.In order to improve the quality of the data,this paper analyzes and preprocesses the data.According to the characteristics of high noise values in osteoporosis data set,DBSCAN-based noise value removal is performed on the data set.In view of the characteristics of large missing values in data sets and weak correlation between training attributes and annotations,on the basis of KNN,a partial missing value filling algorithm PKNN based on correlation coefficient is proposed to fill the data set specifically.In order to improve accuracy,on the basis of numerical data after pretreatment,this paper considers the text data and the bone image data.The feature selection is used to screen out the numerical features that are more important for osteoporosis risk prediction.Word2vec and CNN are used to extract text features and image features,respectively.And then the three features are fused.Finally,for the characteristics that labels of the osteoporosis-related data are difficult to obtain while the unlabeled questionnaire data is easy to obtain,this paper uses self-training semi-supervised model whose base classifier is XGBoost,and proposes a repeat marking strategy to optimize it,making full use of unlabeled data to improve the generalization performance of the model.The experimental results show that the accuracy of the model in this paper is high,and each module brings a certain performance improvement to the model,and it still obtains good classification performance with a small number of labeled samples. |