Objective:Hypertension is a common non-communicable disease and risk factor for for cardiovascular diseases(CVD)such as stroke and heart failure.It is typically asymptomatic and only detected through opportunisitic screening.Therefore,early identification and diagnosis of hypertension is of great significance for the prevention of hypertension and its complications.This study selected the influencing factors based on personal characteristics and medical indicators,and constructed identification models and trajectory models based on cross-sectional and longitudinal data respectively.The study aimed to compare the performance of identification models using different algorithms,capture the development trend of blood pressure in different trajectory groups,evaluate the importance of various factors in the model,identify individual characteristics associated with different trajectory patterns and explore the dynamic links between hypertension and related disorders over time.Methods:The first part collected the data from 2227 patients with hypertension and 7682 healthy control participants at the Health Management Centre of Drum Tower Hospital from January 2020 to December 2020.Factorial analysis of mixed data(FAMD)was applied to impute missing data and synthetic minority oversampling technique(SMOTE)was used to deal with class imbalance in dataset.Univariate analysis was conducted to select the factors.We considered screened factors as independent variables and blood pressure status as dependent variable to established the recognition models by using logistic regression,back propagation neural network(BPNN),random forest(RF),support vector machine(SVM)and extreme gradient boosting(XGBoost)algorithms.The hold-out method was used to demonstrate the performance of models.Sensitivity,specificity,positive predictive value(PPV),accuracy,G-mean,F1 score,Matthews correlation coefficient(MCC)and the area under the receiver operating characteristic curve(AUC)comprehensively evaluated and compared the performance of different models.Feature importance was ranked according to accuracy,Gini index and Shapley value.Local Interpretable Model-Agnostic Explanations(LIME)and SHapley Additive ex Planations(SHAP)were used to explain the model.The second part collected the annual follow-up data from 67936 participants of physical examination at the Health Management Centre of Drum Tower Hospital from January 2008 to December 2020,among whom 49570 participants did not develop hypertension during the study period.Group-based trajectory modeling(GBTM)technique was used to identify both systolic blood pressure(SBP)and diastolic blood pressure(DBP)trajectories of general population(n=67936)and population with no hypertension(n=49570),and explore the relationship between influencing factors and trajectories.Linear regression was applied to impute the missing value of covariates.Sensitivity analysis was considered as repeated analysis based on imputed data to verify the stability of models.The association of hypertension,diabetes mellitus(DM)and CVD in the general physical population was analyzed by group-based dual trajectory model.Results:The results of the first part showed that 17 variables including age,sex,smoking status,waist circumference(WC),body mass index(BMI),high density liptein cholesterol(HDL-C),low density lipoprotein cholesterol(LDL-C),total cholesterol(TC),triacylglycerol(TG),glycosylated hemoglobin(Hb A1c),fasting blood glucose(FBG),uric acid(UA),chronic kidney disease,DM,hyperlipidemia,CVD and family history of CVD(male relatives)were selected to construct the models.The sensitivity(86.8%),specificity(88.5%),PPV(88.1%),accuracy(87.6%),G-mean(0.876),F1 score(0.875),MCC(0.753)and AUC(0.949)of RF model indicated its relatively best performance among the five models.Age was the most important variable in the model,followed by FBG.Chronic kidney disease and sex contributed the least to the model.The results of the second part showed that 4 different trajectory patterns of SBP and DBP were identified in both general population and population with no hypertension.4 trajectories of the SBP model in population with no hypertension were normal-slow growth group(21.80%),normal-rising group(43.50%),prehypertension-rising group(28.80%),and prehypertension-rapid rising group(5.90%).4 trajectories of the DBP model in population with no hypertension were normal-slow rising group(23.10%),normal-rising group(43.60%),prehypertension-rising group(28.30%)and prehypertension-rapid rising group(5.00%).All trajectory of blood pressure in population with no hypertension showed increase in blood pressure over time.Older men were more likely to be assigned to the trajectory groups with higher blood pressure levels.BMI,pulse,TG and FBG was correlated with the upward changes of 4 trajectories of both SBP and DBP(β>0,P<0.05).4trajectories of the SBP model in the general population were normal-rising group(33.65%),prehypertension-rising group(41.32%),grade 1 hypertension-rising group(19.96%)and grade 2 hypertension-increasing-declining group(5.07%).4 trajectories of the DBP model in the general population were normal-slow rising group(27.99%),normal-rising group(41.68%),prehypertension-rising group(24.80%),and hypertension-stable group(5.53%).Except for grade 2hypertension-increasing-declining group of the SBP model and hypertension-stable group of the DBP model,other trajectory groups in the general population showed increase in blood pressure over time.Older women were more likely to appear in the trajectory group with higher SBP levels and older men were more likely to appear in the trajectory group with higher DBP levels.BMI,pulse,TC,TG,FBG and UA were correlated with the upward changes of SBP and DBP in 4trajectory groups(β>0,P<0.05).The occurrence risks of hypertension,DM and CVD in general population were 3,2 and 2 trajectory patterns respectively.3 trajectories of hypertension were persistent low risk group(81.44%),increased risk group(5.01%)and persistent high risk group(13.55%);2 trajectories of DM were persistent low risk group group(94.91%)and increased risk group(5.09%);2 trajectories of CVD were persistent low risk group(95.14%)and increased risk group(4.86%).The occurrence risks of hypertension,DM,and CVD are positively associated with each other.Conclusion:Machine learning(ML)performed well in hypertension identification,especially the RF algorithm,indicating the potential in constructing rapid and non-invasive predictive models to identify patients with hypertension in the furture.It also provided experience for developing hypertension identification model with accuracy based on multiple factors.The importance of age in hypertension models suggested a critical role for early prevention in hypertension.Trajectory analysis revealed that there is heterogeneity in blood pressure and age,sex and other characteristics in the general population and population with no hypertension were different among trajectories.The dual trajectory analysis emphasized that the "comorbidity" characteristics of CVD and other related disease provided a basis for multifactorial and subgroup early prevention and treatment of hypertension. |