| Challenges such as the accelerating aging of the population,unhealthy lifestyles and the pressures of a fast-paced society,more and more people are developing health problems,chronic diseases have become one of the major diseases in China,in particular,hypertensive diseases.Although many studies have explored the treatment and prevention of hypertension,the analysis of factors influencing hypertension remains a austere challenge.In recent years,there have been major developments in healthcare information technology and digital management systems,more and more machine learning techniques have been applied to hypertension impact factor analysis in order to identify potential risk factors and predict the possibility of relapse.The use of these methods has implications for a better understanding of the underlying mechanisms of hypertension and for the application of machine learning in the field of hypertension.Firstly,based on the physical examination data of a hospital,it was found that there were differences in the prevalence of hypertension among different populations,We were divided into four groups: male,female,young and middle-aged.Then,tthe data set is preprocessed by irrelevant variable processing,deletion and filling of missing values,outlier judgment and data coding,and descriptive statistical analysis of the data was also performed.Then,four feature selection methods,namely Lasso,XGBoost,random forest and SVM-RFE,were used to screen the features and select the optimal subset.Aiming at the problem of data imbalance,SMOTEENN comprehensive sampling method is adopted for processing,then combined with random forest,SVM,neural network and XGBoost four algorithm models for prediction,AUC,accuracy and F1 values are used to compare the advantages and disadvantages of different models.Experiments show that,The optimal models of young,middle-aged,male and female groups were Random Forest &SVM,XBGoost&XGBoost,XBGoost&XGBoost,XBGoost&XGBoost,respectively.Finally,based on optimal models of different populations,Stacking is integrated.The first layer of base learning selects random forest,XGBoost,SVM,and neural network,and the second layer of meta-learning selects Logistic regression.Comparing AUC,accuracy and F1 values,the results show that the Stacking fusion model does not necessarily improve the stacking effect for different populations.The optimal models for young people and female are both Stacking fusion models,while the optimal models for middle age and male are XGBoost&XGBoost.Moreover,the importance of characteristics of the optimal models of the four populations were ranked respectively,and the cross-sectional comparisons were made to compare the similarities and differences between different populations of hypertensive patients.Should pay more attention to systolic blood pressure,percentage of eosinophilic cells,alkaline phosphatase,diastolic blood pressure,urea nitrogen,body weight,triglyceride and fasting blood glucose,which provide some reference value for future hypertension control. |