Font Size: a A A

Comparison Of Classification Methods For Property Of Chinese Medicinal Herbs Based On Original Characters

Posted on:2012-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2214330338964107Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
"Cold","cool","warm" and "hot" of Traditional Chinese medicine(TCM), namely "The Four Natures" or property of TCM, in the field of Chinese materia medica, is fundamentally deduced from the effect after traditional medicine applied to patients on body responses. Take dried ginger for example, it can get rid of coldness and bring warmth and is characteristic of "hot" property. While, isatis tinctoria, whose root used as medicine for Banlangen and leaves used as medicine for the Daqingye, is heat-clearing and detoxifying and characteristic of "cool" nature. In essence, cool and warm properties could be regarded as low-level cold and hot properties, respectively. Therefore, property of TCM is generally featured as cold or hot [1-2]TCM physicians[3] hold that the occurrence and development of diseases is mainly because of pathogenic factors effect on human body, bringing Yin and Yang out of balance, organs and meridians functional disturbance. In effect, the nature of herbal medicine is related to the degree of Yin and Yang, namely cool and cold (extreme Yin), warm and hot (extreme Yang). As result of this, TCM physicians intend to make good use of the nature of TCM to rectify this kind of imbalance/misbalance, that is, "To treat cold syndrome with hot herbs, and to treat heat with cold", to keep organs and meridians function in order again and then patients would become revive at last[4] Therefore, physicians would make a personalized prescription to balance disharmony after they determine the energetic temperature and functional state of the patient's body. For example, kind of "cool" medicine, characteristic of Yang nature, like Banlangen or Daqingye referred to above, would be doctors'first choice when a patient is suffering from fever. In this sense, "Four natures" Theory, on one hand, provides guidance with doctors when making a description treat patients. On the other hand, "Four natures" Theory is very pivotal for development of new drugs and Chinese medicines[5].TCM researchers have carried out many modern researches on property theories of TCM. SUN Ruo-qiong[6] explored the correlation between flavor and property of TCM based on structure equation model and pointed that property of TCM has some correlations with five flavors and especially the sweet and sour mostly influences the property of drugs. Liu Hui[7] explored the properties traits of the fruit kind TCM and concluded that fruit kind TCM shows distinct properties—ercent of TCM with warm property in fruit group is higher than that in control group, while the percent of fruit kind TCM with cold property is lower than the control, which may imply common material basis. Zhang Yong-qing[8] summarized that there is close relationship between property of TCM and their medicinal part. Liu Jin and Deng Jia-gang[9] concluded that Chinese medicine property and their inorganic elements have certain relationship, especially amount of inorganic elements—otassium(K) and magnesium(Mg) in TCM play the most powerful role when discriminate peaceful/non-peaceful property of drugs. Tang Shi-huan[10] researched the impact of physical environmental factors on nature of TCM and analyzed that the formation of TCM nature result from the synthetic action of the environmental factors, such as climate, soil, biology, topography, etc.Actually, TCM can be made from plant parts(known as Chinese herbal medicine, CHM), human parts, animal parts, and minerals. Chinese herbal medicine plays a pivotal role in everyone's healthiness, everyday's social life in Asia, especially in China. At present, Chinese medicines are increasingly popular and world-widely applied in hospitals, clinics or community health centers under the belief by immense TCM practitioners that TCM/CHM is capable of radically treating many internal medicine complaints and diseases without toxins and side effects comparing to western medicines. TCM experts contend that in essence, cool and warm properties could be regarded as low-level cold and hot properties, respectively. Therefore, this study mainly focuses on Chinese herbal medicine (CHM) with "hot" or "cold" nature.In this study, information on the properties and original characters of 1,725 kinds of CHM was collected from "Chinese Herbal Medicine"[11] compiled by the State Administration of Traditional Chinese Medicine in the People's Republic of China; Original characters(predictor variables) cover 23 blocks with 523 variables in detail as follows:medicinal parts, collecting seasons, processing methods, distribution areas, ecological habits, growing environment, growing characteristics, plants categories, root characters, stem characters, leaves characters, flower characters, fruit characters, seed characters, medicines shape, medicines color, medicines appearance characters, medicines texture, medicines cross-section characteristics, medicines flavor, microscopical characters, secondary metabolites, pharmacologic function, and so on. Property of CHM as response variable and original characters as predictor variables, Logistic discrimination(Logistic-DA),support vector machines(SVM),decision tree(DT),random forests(RF),principal components analysis-linear discriminant analysis(PCA-LDA) and Partial least squares discriminant analysis(PLS-DA) were respectively put into use to classified the property of CHM; Bases on the performance of these six classifiers,10 interaction-5 fold cross-validation and hold-out prediction were performed to confirm the best discrimination method and to provide some advice to fast distinguish properties of TCM based on original characters.Research results are as follows:1. Back-substitution accuracy results:Six discriminant analysis models were applied to classify CHM property based on their original characters. The Back-substitution accuracy value for PCA-LDA is lower to 89.10%, while other classifiers are above 90%. Especially the accuracy values of support vector machine classification and decision tree classification are higher with 96.35% and 93.80, respectively.2. Robustness assessment results:We assessed the robustness of classifier performance by means of the overall average of the prediction error rate via 10 interaction 5-fold cross validation on the collection of 1725 CHM. The CV-accuracy values for the entire six classification model are mostly over 83% (except Logistic model 77.47%), among which the highest value is the one of random forest 89.58%.3. Predictive accuracy assessment results:1725 CHM are divided randomly training set and testing set, which consisted of 1380 CHM (80%) and 345 ones (20%) respectively. Firstly, Logistic-DA,SVM,DT. RF. PCA-LDA,PLS-DA methods are performed to build classifiers based on training set; Then classifiers are applied for predict the property of CHM in test set. Predictive accuracy values except Logistic discriminant are more than 80%, with higher values of random forest and PLS classifiers among this six discriminant classification models.4. Based on the optimal discriminant model——Partial least squares discriminant model and according to the actual original characters'information of 1725 Chinese herbal medicine(CHM), this model visualized the CHM property-features and obtain "cold" or "hot" propensity of original characters and theoretical property value for every CHM.5.72 original characters, which contribute CHMP prediction most, were determined to establish simplified PLS discriminant model based on the principle as following:the absolute coefficient value is required to be equal to or above 0.2 and absolute variable importance value in projection(VIP) equal to or above 1.5. The accuracy for substitution, cross-validation and hold-out prediction are over 72% based on the simplified PLS discriminant model.Conclusion:1. In the present work, we proposed that CHM original characters have powerful impacts on the property of CHM(CHMP) and confirmed the rationality of this initiative based on six well-known classification methods(Logistic-DA,SVM,DT,RF,PCA-LDA,PLS-DA). As the back-substitution accuracy values, cross validation discriminant accuracy values and predict accuracy values exceed 80% based on the six models, our hypothesis was powerfully confirmed.2. On the basis of model assessment, including 10 interaction 5-fold cross-validation, and prediction for test dataset, we can conclude that:(1) Since original characters are obviously correlated and high-dimensional, classical Logistic discriminant analysis cannot predict property of CHM with high predictive accuracy comparing to other classification models; (2) SVM and DT models show fine robustness with high-accuracy of CV validation, yet weak prediction because of large gap between predictive accuracy in training set and test set; (3) RF performs perfect. Not only does RF high-accurately discriminate property of CHM, but also provides Gini values to evaluate every predictor'importance to distinguish property. However, this classification model fails to provide any evidence the sign of predictors which represents direction of correlation between original characters and CHMP; (4) PCA_LDA attempts to find a set of orthogonal principal components (linear combinations of original independent variables) to account for the maximum variations in independent variables for dimension reduction, yet fails to taken into account the information about the sample classification provided by dependent variable; (5) PLS_DA, fully utilizing information on both independent and dependent variables by mean of the extraction of PLS component for dimension reduction, provides a better performance in classification and prediction than PCA_LDA. In conclusion, if the aim of research is to merely distinguish property, RF model can satisfy the request; if to identify the CHM original characters related to cold and hot property, PLS-DA model is optimum.3 According to the visualization of the "cold" and "hot" original character value distribution for 1725 CHM based on the optimum model-PLS-DA model, it can be obviously shown that original characters correlated to the CHMP is objective existent. What is more, whatever a given CHM with "cold" or "hot" property, it contains both "cold" and "hot" relevant original characters. Only when the quality and quantity of "cold" and "hot" relevant original characters differently combine together, does this certain CHM demonstrate the overall property propensity.4. The research uses CHMP-original characters and VIP value of PLS_DA model in order to simplify statistics model by the variable selection criteria, namely, CHMP-original characters coefficient equal to or above 0.2 and VIP value equal to or more than 1.5. PLS_DA simple model is established to quickly predict property of CHM based on 72 CHMP-original characters. The overall accuracy is 74%, which is qualified for fast discriminate properties of TCM from original characters.
Keywords/Search Tags:Property of Traditional Chinese Medicine, Discriminatory Analysis, Logistic Discrimination, Support vector machines, Decision Tree, Random Forest, Principal component analysis, Partial least squares
PDF Full Text Request
Related items