Objective For the data which came from chronic disease survey, establishing RBF-DDA neural networks, Support Vector Machines, Random Forests three coronary heart disease recognition models. Explore different machine learning methods used in the identification of coronary heart disease screening. Verify the possibility of different machine learning applications in coronary heart disease screening, to provide a support to improve coronary heart disease screening methods.Methods The data from 2012 chronic disease and risk factors survey among adults in Jilin province was divided into training and test data sets. Firstly, the training data was set to establish training RBF-DDA neural network model, Support Vector Machine model to identify coronary artery disease screening model and the random forest model, the use of the test data set of input variables was entered the established model to predict recognition output variables, and the actual output variable test data set were compared to identify the effect of generating model. Then by ten-fold cross-validation methods on different parameters involved in the model optimization. Finally, the optimal parameters established three models, the effect of screening to identify the model tested, and by precision, sensitivity and specificity of screening to identify other indicators of effectiveness of different models were compared.Results 1. Optimal parameters of RBF-DDA neural network achieved as follows: activation threshold(0.5) and the suppression threshold(0.5). The RBF-DDA neural network model prediction recognition test concentration residents are suffering from coronary heart disease and compare with actual value, the accuracy of model is 55.83%, sensitivity is 58.88% and specificity is 55.46%, G-Mean is 57.14%. The results of RBFDDA neural network model is better than the traditional RBF neural network. 2. There are four different kernel functions in Support Vector Machines, use different kernel functions to establish four kinds of Support Vector Machine Model: linear-SVMã€radial-SVMã€sigmoid-SVMã€polynomial-SVM. Optimal parameters of linear-SVM model is cost=0.5. Optimal parameters of radial-SVM model is cost=2, gamma=0.01. Optimal parameters of sigmoid-SVM model is cost=4, gamma=0.001, coef0=0.25. Optimal parameters of polynomial-SVM model is cost=8, gamma=0.001, coef0=0.25, degree=3. After parameter optimization, the four models get optimal recognition results. The accuracy of linear-SVM model is 68.03%, sensitivity is 76.45% and specificity is 67.01%, G-Mean is 71.57%. The accuracy of radial-SVM model is 65.32%, sensitivity is 77.57% and specificity is 63.84%, G-Mean is 70.37%. The accuracy of sigmoid-SVM model is 67.93%, sensitivity is 77.94% and specificity is 66.71%, G-Mean is 72.11%. The accuracy of polynomial-SVM model is 67.58%, sensitivity is 79.07% and specificity is 66.19%, G-Mean is 72.34%. 3. There are two parameter optimization methods of Random Forest model: manually control and ten-fold cross validation. The optimal parameters of manually control is mtry = 6, ntree = 300. The optimal parameters of ten-fold cross validation is mtry = 6, ntree = 290. The accuracy of Random Forests model optimized by manually control is 66.86%, sensitivity is 81.50% and specificity is 65.08%, G-Mean is 72.83%. The accuracy of Random Forests model optimized by ten-fold cross validation is 66.49%, sensitivity is 80.56% and specificity is 64.79%, G-Mean is 72.24%. 4. Compare the combined effect of three models’ recognition, Random Forests get the best. The accuracy of Random Forests model is 66.86%, sensitivity is 81.50% and specificity is 65.08%, G-Mean is 72.83%. In accuracy and specificity, there is little difference between random forests and Support Vector Machine, both of them is better than RBF-DDA neural network. In sensitivity, Random Forest is the best, Random Forests 2.43% higher than the Support Vector Machine, and 22.62% higher than the RBF-DDA neural network. In G-Mean, Random Forests 0.49% higher than the Support Vector Machine, and 15.69% higher than the RBF-DDA neural network.Conclusion 1. The three models, RBF-DDA neural networks, Support Vector Machine and the Random Forest, have the feasibility in recognition of coronary heart disease based on chronic survey data. 2. After parameter optimization for RBF-DDA neural networks, Support Vector Machine and Random Forest, the recognition effect of coronary heart disease have raised. 3. Compare the recognition effect of coronary heart disease among three models, Random Forest is best, followed Support Vector Machine, the worst RBF-DDA neural network Model. |