| Objective: In this study,a variety of machine learning algorithms are used to construct standardized approaches for syndrome types and tongue image acquisition.Then,combined with the clinical data,a mixed depth neural network model for the diagnosis and prediction of type 2 diabetic nephropathy was established.To explore the practical significance of TCM syndrome and tongue image in disease risk prediction through model comparison and evaluation.Methods: 1.Data collection: According to the corresponding diagnostic criteria,the patients with type 2 diabetes were included.Using the combination method of "symptom,syndrome element,syndrome differentiation guidelines,expert experience",the syndrome types of all patients were marked respectively.A unified image acquisition device is used to collect the tongue image of patients according to the unified standard.Collect the comprehensive clinical data of patients,including general information,auxiliary examination indicators,etc.According to the diagnostic criteria of diabetic nephropathy,the patients were labeled as non diabetic nephropathy and diabetic nephropathy.2.Data preprocessing: Filter and integrate the data,remove the abnormal data,duplicate data and error data,and standardize the format.The missing values were filled with the characteristic mean.The index was normalized,in which the binary variables were processed by one-hot and the continuous variables were treated by regularization.Using principal component analysis or exploratory factor analysis to reduce the dimension of features,using the classic shuffle algorithm to disorder the data order,so that the data distribution is uniform,according to the proportion of 8:2,the data is divided into training set and test set.3.Construction of syndrome classification model: Screening out symptom items with frequency less than 10%.The remaining four diagnostic information of traditional Chinese medicine was analyzed by exploratory factor analysis.The common factors after dimensionality reduction,respectively,use support vector machine,decision tree,polynomial naive Bayes,k-nearest neighbor,bagging_k nearest neighbor,bagging_decision tree,random forest,adaptive enhancement,gradient promotion decision tree,artificial neural network and other machine algorithms to build the syndrome classification model of type 2 diabetes,compare the prediction accuracy of the model and judge the classification performance of the model.4.Automatic segmentation model construction of tongue image: The patient’s tongue image is labeled by labelleme software based on Python 3.6.The multi task convolution neural network is used to construct the model of tongue region boundary detection and location.The tongue body in the image was extracted by the medical image segmentation algorithm attention u-net.The average ratio of intersection and Union(Miou)91.05%,pixel accuracy(PA)and other indexes were calculated based on the ground truth image,and the results of tongue body segmentation were evaluated.5.Multimodal feature fusion of traditional Chinese and Western medicine: Including patient’s general information,auxiliary examination indexes,etc.After data preprocessing and feature dimensionality reduction,use the same multiple machine learning algorithms to build model I.According to the classification variables,the one hot processing is carried out for the syndrome data.The syndrome data is integrated into model one,and a variety of machine learning algorithms are used to build model two.Further,the tongue image data is fused according to the weight of 0.2-0.4,and the model 3 is constructed by deep learning.The accuracy,specificity and sensitivity were used to evaluate the prediction effect of the model.Results: 1.Distribution of general information characteristics: 868 patients with type 2 diabetes were included,521 of them were male,accounting for 60.02%;347 of them were female,accounting for 39.98%.The average age of the patients was 56.2 ± 11.84 years old,with 61-70 age group as the most.According to BMI distribution,overweight and obesity accounted for 52.99%.2.Distribution of main symptoms and syndromes: 29 of the symptoms(excluding tongue pulse)with frequency greater than 10%.In all pulse conditions,there are 6 symptoms with frequency greater than 10%.In the distribution of syndrome types,151 cases of deficiency of Qi and Yin accounted for the most,accounting for 17.40%;149 cases of deficiency of both qi and Yin and blood stasis accounted for 17.17%.In the cases of kidney disease and non kidney disease,the most common types of type 2 diabetes mellitus without kidney disease were Qi and yin deficiency 19.94%,Qi and yin deficiency and blood stasis 17.85%,liver and kidney yin deficiency 15.76%;while the most common types of kidney disease were Qi and yin deficiency and blood stasis 16.26%,liver and kidney yin deficiency and blood stasis 14.63%,liver and kidney yin deficiency 12.60%.3.Syndrome element distribution: after the symptom index is collected and the frequency is less than 10%,there are 42 symptoms left.The effect of exploratory analysis is better than principal component analysis.After dimension reduction of 42 symptom indexes,when 15 common factors were extracted,it was the best,and the cumulative variance contribution rate was 67.5229%.15 common factors include liver,kidney,stomach,heart and spleen from high to low.The frequency of disease syndrome elements from high to low are hyperactivity,yin deficiency,Qi deficiency,Yang deficiency,blood stasis,phlegm turbidity and blood deficiency.4.Syndrome classification prediction model: the accuracy is 62.65% for SVM,61.18% for decision tree,77.06% for polynomial naive Bayes,64.12% for k nearest neighbor,74.12% for bagging Gu K,68.53% for bagging Gu decision tree,75.36% for random forest,56.48% for adaptive enhancement,79.06% for gradient enhancement decision tree and 87.70% for artificial neural network.5.Tongue image segmentation model: a multi task convolutional neural network(mtcnn)is used to construct a cascaded CNN architecture,which is a combination of three networks(p-net,R-Net,o-net).The results show that the average accuracy of boundary detection is 60%(ap60)is 59.5%,and the ratio of intersection and Union(IOU)is 93.2%,which is significantly better than VJ face detection algorithm,hog direction gradient histogram algorithm,DPM deformable component algorithm.The mean error rate(MER)and failure rate(FR)of tongue edge feature point location are 2.5% and 2.9%,which are better than ASM active shape model algorithm,AAM active appearance model algorithm and CPR cascaded shape regression model algorithm.The model of tongue image segmentation is constructed by deep learning,and the tongue body in the image is extracted.The accuracy of the algorithm is 91.05% in ground truth and 93.31% in PA.6.Disease prediction model of feature fusion of traditional Chinese and Western medicine: the effect of principal component analysis is better than factor analysis in the whole,when extracting 20 common factors,it is the best,and the cumulative variance contribution rate is 72.9351%.In the first model,the highest accuracy is 81.16%,the highest sensitivity is 82.57%,and the highest specificity is 84.80%.In the algorithm of model 2,the highest accuracy is 85.13%,the highest sensitivity is 83.07%,and the highest specificity is 85.25%.In model 3,the accuracy was 87.23%,the sensitivity was 83.15%,and the specificity was 86.25%.Conclusion: 1.The syndrome elements of type 2 diabetes mainly include liver,kidney,stomach,heart and spleen.The syndrome elements of type 2 diabetes mainly include hyperactivity of heat,yin deficiency,Qi deficiency,Yang deficiency,blood stasis,phlegm turbidity and blood deficiency.The syndrome types of non nephrotic patients are mainly deficiency of Qi and Yin,deficiency of Qi and Yin and blood stasis,deficiency of liver and kidney yin,deficiency of Qi and Yin and blood stasis,deficiency of liver and kidney yin and blood stasis,deficiency of liver and kidney yin.2.By using exploratory factor analysis and neural network to construct syndrome classification model,the standardized diagnosis of type 2 diabetes can be realized.3.The automatic segmentation model of tongue image can be constructed by using deep learning,and the objective extraction of tongue image can be realized.4.On the basis of the above,a hybrid depth Neural Network disease prediction method is constructed,which integrates clinical comprehensive data,syndrome type and tongue image data.The method uses principal component analysis,exploratory factor analysis and depth neural network structure,and has good prediction performance.5.Syndrome type and tongue image have a positive effect on promoting the efficiency of disease prediction model.The fusion of multimodal features of traditional Chinese and Western medicine can improve the efficiency of prediction model of type 2 diabetes mellitus complicated with nephropathy. |