Font Size: a A A

Study On Syndrome Differentiation Of Diabetic Nephropathy Based On Complex Network Theory

Posted on:2017-05-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:X TongFull Text:PDF
GTID:1104330482485734Subject:Basic Theory of TCM
Abstract/Summary:PDF Full Text Request
ObjectiveTo solve bellowing practical problems that exist in the modeling work based on the complex network theory, under the background of research on computer-aided differentiation of diabetic nephropathy:1. To put forward and analyze the data of diabetic nephropathy, to explore the processing method for the multi-label data, solve the deviation from the actual classification results caused by single label learning.2. To establish a multi-label feature selection method; solve the serious influence on model performance due to the high dimensional and sparse data and lack of typical feature combination.3. To set up a multi-label syndrome differentiation model in order to handle the composite and plural syndrome diagnosis.Methods1. According to multi-labelled data in diabetic nephropathy, we construct a diabetic nephropathy syndrome differentiation network (DNBZN) based on complex network theory after extensive literature research, in order to set up a reasonable and effect representation for DN.2. According to the characteristics of high dimensional and sparse, we first apply a new feature selection method to preprocess the data of diabetic nephropathy, the feature selection method based on generalized social cooperation network in the theory of complex networks and use the overlapping community detection algorithm, Bitector, to find overlapping communities, in order to select strong representative and typical feature combination. Then we set up a multi-label feature dataset with the results of community detection, and further represent the dataset in a structured way, and this structured dataset will be used in the following chapters which focus on classification modeling.3. According to multi-label classification, we apply four machine learning, svm, AdaBoost, RBF and K nearest neighbor to model the multi-label classification problem. And we also try to use several basic classifiers to become different multi-label classifiers. We search for the optimal differentiation model for diabetic nephropathy according to the results of training and adjusting the model parameters.4. For the performance evaluation for multi-label classification model, we apply 5 evaluation index, Hamming Loss, Ranking Loss, One-error, Coverage and Average Precision to comprehensively evaluate the performance of multi-label syndrome differentiation model.Results1. Through literature research, we finally collected 113 symptoms (features) and 15 syndromes (labels) from 256 articles; each symptom belongs to at least 1 and up to 6 labels. The DNBZN we set up have 113 symptom nodes and 15 syndrome nodes, edges represent the specificity of symptom and syndrome, edge weight is quantified using Gini Index. After community detection, we finally find overlapping communities of Yin deficiency of Liver and kidney and qi and Yin deficiency; overlapping communities of qi deficiency of spleen and kidney, Yang deficiency spleen and kidney, Yin and Yang deficiency and other 10 non overlapping communities. Each node within the community is strong representative and typical feature combinations, node outside the community are regarded as redundant or irrelevant features and to be deleted. The validation of the result of feature selection show that the selected features are reasonable and conform to the TCM theory and clinical practice.2. Based on the literature research and network construction, we established a feature dataset with 113 features and 15 labels. We build the relationship between individual feature and feature combination and the labels. There are 189 relationships between individual feature and labels, we expand the relations between feature combination and labels, and finally we build 1759 relationships between these features and labels in the dataset.3. In the "problem transformation" strategy, we set up many basic binary classifier using SVM and AdaBoost. We choose 5 kernel function respectively, linear kernel function, quadratic kernel function, polynomial kernel function, the radial basis kernel function and multilayer perception kernel function for SVM, and binary classification accuracy for each label is close to 98%; we choose 3 AdaBoost, Real AdaBoost, GentleAdaBoost and Modest AdaBoost as binary classifiers for AdaBoost, and binary classification accuracy for each label is more than 97%. In the "algorithm adaptation" strategy, using RBF neural network and K nearest neighbor algorithm directly on the multi-label feature dataset to classify multiple labels; compare classification accuracy within the scope of the K value from 1 to 10, and we found that when K take 2 or 6, model reaches the highest classification accuracy at 94.67%.4. We compute Hamming Loss, Ranking Loss, One-error, Coverage, and Average Precision using 10 fold cross-validation method, to comprehensively evaluate above four syndrome differentiation models. The results show that all of these four models have the good performance, and achieved satisfactory classification accuracy. By comparison of them, the comprehensive performance of SVM is the best, followed by AdaBoost and RBF, the comprehensive performance of KNN is relatively weak.Conclusion1. Multi-label learning is more suitable for TCM clinical practice, effective exploration and discovery of multiple-label data representation and machine learning strategy can improve the accuracy of computer aided differentiation, improve composite and plural syndrome diagnosis, and further provide more references for clinical treatment.2. Feature selection based on complex networks overlapping community detection can effectively extract the strong representative and typical feature combination, and can significantly improve the classification performance. It is a new and effective method for diabetic nephropathy feature selection.3. The modeling strategy used in this article is qualified for multi-label classification, and classification performance is satisfactory; meanwhile the models can be applied with high dimensional, sparse and nonlinear data and other problems in the field of traditional Chinese medicine.
Keywords/Search Tags:diabetic nephropathy, syndrome differentiation, machine learning, modeling, complex networks, community detection
PDF Full Text Request
Related items