| Nowadays,big data and machine learning have occupied an important position in many fields,and the medical industry has made great progress and development in the collection,processing,analysis and application of data.It is of great significance to use medical data to build predictive models and provide certain auxiliary functions in medical decision-making.Hypertension is one of the most difficult chronic diseases to treat worldwide,it is considered to be the largest contributor to the global burden of disease prevention,and it is also a major contributor to cardiovascular disease.The prediction of the risk of hypertension can help patients achieve prevention and effective treatment before or in the early stage of the disease,and prevent various complications caused by the deterioration of the disease that are lifethreatening.Due to the high dimensionality,intractability,and imbalance of hypertension data,it is difficult to provide accurate predictions using a simple prediction model.Therefore,this paper proposes a study on the optimization of hypertension risk based on machine learning,using two optimization strategies,to improve the random forest model.The main research content of this paper is as follows:First,the hypertension data in NHANES database and CHARLS database were analyzed.The source and related attribute characteristics of hypertension data are introduced in detail,and then the raw data of hypertension are preprocessed,including data visualization,data cleaning and data normalization,etc.Next,to solve the problem of hypertension data imbalance,use the CURE-SMOTE algorithm balances the data,and finally uses traditional machine learning algorithms to establish a predictive model.Secondly,the EDE-RF hypertension risk prediction model was established.The random forest model is composed of multiple decision trees,and different numbers of decision trees will lead to very different performances of the model;a suitable subset of feature attributes can prevent over-fitting of the model,increase the diversity between classification trees,and reduce the difference between trees.The correlation between them can also improve the prediction quality of the random forest model;in addition,the depth of the decision tree in the random forest will also affect the fitting degree of the model.Therefore,selecting the optimal parameter combination can improve the classification accuracy of the model.This paper proposes a differential evolution algorithm based on the elite retention strategy,which is combined with the random forest model to construct the EDE-RF prediction model.The model seeks the optimal combination of parameters by optimizing the three parameters of the random forest algorithm,the number of decision trees,the subset of feature attributes and the maximum depth.Then compared with other optimization algorithms,the results show that the method proposed in this paper can accurately and quickly find the optimal solution of model parameters,and effectively improve the performance of the prediction model.Finally,an improved EDE-IRF hypertension risk prediction model was constructed.Aiming at the problem that the EDE-IRF model will generate decision trees with low accuracy and high similarity in the process of parameter optimization,a screening scheme for decision trees is proposed.Screening is performed according to the classification accuracy of the decision tree and the similarity between the decision trees,and the decision trees that meet the criteria are recombined to form a new random forest model.This method can improve the difference between decision trees and retain trees with higher accuracy to achieve the purpose of model optimization.Accuracy,Sensitivity,Precision,F-Measure and ROC curve to evaluate the experimental results.The experimental results show that the improved EDE-IRF model has a significant improvement in multiple indicators,can more accurately predict the risk of hypertension,and has a high reference value in practical applications. |