Font Size: a A A

A Literature Study On Commonly Used Data Analysis Methods In Syndrome Study And Analysis Of Depression Depression Based On Implicit Model

Posted on:2016-04-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:W D CaiFull Text:PDF
GTID:1104330461993159Subject:Diagnostics of Chinese Medicine
Abstract/Summary:PDF Full Text Request
Syndrome is the core and key section of traditional Chinese medicine (TCM) diagnosis theory which is definitely different from those of modern medicine. Based on the integration of modern medicine disease and Chinese medicine syndrome, model of disease-syndrome is established as a major premise. Syndrome diagnosis data model is constructed with the use of objective and unsupervised mathematics methodology, providing for research of unsupervised syndrome analysis with important justification thereafter.At the present time, there is a tendency of carrying on research in Chinese medicine syndrome-elements, mainly conducting data analysis on the results of internationally recognized method, such as clinical epidemiological survey. However, there are still a lot of problems during fumbling in the research of syndrome data analysis. Many diversified statistical methods acted as tools and strategies are being used by some scholars, including from traditional supervised descriptive statistical methods and artificial intelligence to unsupervised factor analysis and cluster analysis, expecting to establish a impartial, qualitative and quantitative mathematical model. However, there are inconsistency between basic data variable assumptions of data analysis and characteristics of symptom variables in TCM syndrome, in addition to the discrepancy between the distribution of data variables and TCM syndromes and symptoms. They all attribute to the disqualification of statistical sample data variable requirements finally. Because of the imperfect interpretation and cutoff of statistical result, subjective standard may be required to refine the data analysis result, changing from unsupervised method to supervised, resulting impartial interpretation or violation of basic principle of TCM or just groundless interpretation. The data analysis methods cannot match with the requirements of TCM syndromes and symptoms completely, leading to standardization of data model in TCM syndromes and symptoms being at a standstill. Therefore the research of syndrome data analysis methodology is a hot button issue and dilemma of TCM syndrome standardization and impartiality study.This research aims at collection and analysis of recent literatures regarding data analysis method of TCM syndrome as to summarize and conclude the merit, shortcoming, features and applicability of major data analysis method in syndrome study. With the integration of disease and syndrome principle as a premise, depression is practiced as a target research.In addition, it is required to collect signs and symptoms from the patients and perform clinical epidemiological survey using unsupervised data analysis method. Based on the clinical investigation result of depression, latent class analysis is exercised in accordance with basic theories of diagnosis in TCM and as well as basic principles of statistics. All these methods aim mainly at providing objective basis for establishing TCM diagnosis standards, and future analysis.Objectives1 To summarize and comb through current status and characteristics of major data analysis methods used in syndrome study via review of recent literatures.2 To comment on the major unsupervised data analysis methods applied currently in syndrome study.3 To perform classification and grouping on depression symptoms collected as target research using unsupervised latent class analysis method.4 To explore the applicability of latent class analysis method in syndrome standardization study.Methods1 Collect and review recent literatures regarding data analysis method of TCM syndrome in latest twenty years. Information collected for research is saved in MS Access database format which subsequently processed in SPSS 17.0 software for detailed descriptive statistical analysis.2 Based on the clinical investigation result of depression, latent class model is used to categorize TCM symptom variables (manifest variables). And latent class analysis is exercised to examine the data from the manifest groups as to figure out how many latent classes and its respective symptom variables (latent variables). Thereafter, a new database with latent details is generated for the second-time data analysis as to extract initial categories of syndrome-elements for depression.Results1 Based on collection and review of recent literatures regarding major data analysis methods used in syndrome study.Out of 1289 recent literatures collected regarding data analysis method of TCM syndrome study,498 literatures are found to be qualified for further statistical analysis. Results of the statistical analysis are concluded as follows:(1) Current status of major data analysis methods used in syndrome study337 out of 498 recent literatures (68%) used unsupervised data analysis methods in syndrome study while the rest of 161 literatures (32%) used supervised data analysis methods. As a result, the usage of unsupervised data analysis methods in syndrome study tends to be a major direction as to provide an objective basis for establishing TCM diagnosis standards. From the review of data analysis methods, cluster analysis and factor analysis are two of the most frequently used methods. There are 207 literatures used cluster analysis while 153 literatures used factor analysis. These two data analysis methods can be used separately, or they can be used at the same time on a single study. Therefore, cluster analysis and factor analysis are often used together in the study. From the review of data model applied, most of the studies are based on TCM diagnostic model (88%) with target research on symptoms distribution (74%). From the review of source data collection, clinical epidemiological survey is often used to collect signs and symptoms from the patients which is qualified as an internationally recognized method for conducting medical study.(2) Comment on factor analysis applied in syndrome studyFrom the review of recent literatures on factor analysis,29 literatures (19%) used correlation coefficient matrix while 3 literatures (2%) used covariance matrix. In the first place, the basic requirement of factor analysis require continuous variable. However, most of the recent literatures in syndrome study used categorical variables which generate problems against the basic requirement. In addition, correlation coefficient matrix is basically a simplified matrix of covariance matrix as it missed many important information, therefore covariance matrix is considered more applicable for conducting factor analysis. From the review of testing precondition, there are 72 literatures with KMO value mentioned and the average value of KMO is 0.69. Among these, 36 literatures mentioned KMO value greater than 0.7. According to modern statistics, the standard value of KMO is 0.7 which is considered as marginal acceptable. It implied that the application of factor analysis is not ideal for syndrome study. From the review of 19 literatures with correlation coefficient mentioned, the average value of correlation coefficient is 0.58. According to modern statistics, correlation coefficient 0.40-0.69 is considered as moderately correlated. As such, almost half of the recent literatures with correlation coefficient mentioned are only considered as marginal acceptable. Furthermore, the main category of factor analysis is exploratory as the method of extracting initial factor is based on the principle component method. However, the principle component method used the minimal factor numbers to explain source data in covariance matrix which does not comply with basic theories of diagnosis in TCM.(3) Comment on cluster analysis applied in syndrome studyFrom the review of recent literatures on cluster analysis, there are 159 literatures (76%) used distance parameters,48 literatures (23%) used correlation coefficient statistical methods and 2 literatures (1%) used fuzzy recognition methods. Initially, the distance parameters is kind of continuous variables and correlation coefficient statistical parameters are being converted from categorical symptom variables, therefore most of the recent literatures on cluster analysis are basically not comply with statistical assumption requirement. Among all recent literatures on cluster analysis, there are 174 literatures(84%) based on symptoms distribution study and 33 literatures(16%) are based on symptoms and syndrome-elements study. Among all, there are 116 literatures (68%) used variable clustering and 91 literatures (32%) used sample clustering. Therefore, clustering is the major process in cluster analysis and it does not consider the relationship between symptoms. Furthermore,56 literatures (27%) used correlation coefficient,55 literatures(27%) used the nearest center coordinate methods,53 literatures(26%) used the groups-linkage methods and 21 literatures (10%) used the sum of squared deviation method. As such, many methods can be used to calculate the distance value and different method will generate different outcome, therefore cluster analysis is not appropriate for TCM syndrome study, especially K-means clustering analysis.2 Based on the analysis of manifest variables in latent class model for depression as target research.Psychiatric symptoms are classified into 6 groups of manifest variables. They are:Group 1:slow reaction, delayed mind, decline of thinking, pessimism, dysphoria, slow motion, irresolute. Group 2:scatterbrained, slow reaction, decline of thinking. Group 3:timid and propensity to be frightened, dysphoria, amnesia. Group 4:not applicable. Group 5:irritability, phobia. Group 6: timid and propensity to be frightened, decline of thinking.Cold, heat and diet symptoms are classified into 5 groups of manifest variables. They are:Group 1:dry mouth and throat, thirst and polydipsia. Group 2:not applicable. Group 3:dry mouth and throat. Group 4:feverish sensation in the palms and soles, intolerance of coldness, self sweating, self induced heat. Group 5:fatigue, intolerance of coldness.Head and facial symptoms are classified into 3 groups of manifest variables. They are:Group 1:sensation of drowsiness and heaviness in the head. Group 2:pale complexion. Group 3:headache, yellow complexion, tinnitus.Chest, abdomen and body symptoms are classified into 6 groups of manifest variables. They are:Group 1:dyspnea, shortness of breath. Group 2:not applicable. Group 3:palpitation, lassitude in the loins and knees, preference for sighing, anorexia and readiness to vomiting, chest distress, sensation of fullness in epigastrium, sensation of oppression in abdomen, feeling heavy in the limbs. Group 4:shortness of breath, sensation of fullness in epigastrium, dyspnea, sensation of oppression and fullness in the chest. Group 5:lassitude in the loins and knees, feeling heavy in the limbs. Group 6:sensation of fullness in epigastrium.State of sleep, stool and urine symptoms are classified into 4 groups of manifest variables. They are:Group 1:insomnia with easiness to be wakened, loose stool. Group 2:loose stool. Group 3:dreaminess, easy to be wakened, easy to fall asleep and easy to be wakened. Group 4:less sleep, sticky stool, insomnia with easiness to be wakened.Inspection of the tongue symptoms are classified into 6 groups of manifest variables. They are:Group 1:light red tongue, white tongue coating. Group 2:Red tongue, yellow tongue coating. Group 3:light red tongue, yellow tongue coating. Group 4:white tongue coating, purple tongue, swelling of the tongue, pale tongue. Group 5:Red tongue, white tongue coating. Group 6:yellow and white tongue coating, thick tongue coating.Pulse condition symptoms are classified into 4 groups of manifest variables. They are:Group 1:thready pulse, weak in pulsation, deep pulse. Group 2: taut pulse. Group 3:slippery pulse, forceful in pulsation, rapid pulse. Group 4:deep pulse, forceful in pulsation.A database is built based on the above classification results. And a second time latent class analysis is used to process the overall syndrome database and finally 4 latent variables are generated detailed as follows:(1) Deficiency of liver and spleen:timid and propensity to be frightened, decline of thinking, loose stool, pale complexion, red tongue, yellow tongue coating, taut pulse.(2) Stagnancy of the liver-fire, incompatibility of liver and stomach irritability, phobia, dry mouth and throat, dyspnea, shortness of breath, dreaminess, easy to be wakened, easy to fall asleep and easy to be wakened, red tongue, yellow tongue coating, taut pulse.(3) Stomach invasion of liver-fire:slow reaction, delayed mind, decline of thinking, pessimism, dysphoria, slow motion, irresolute, dry mouth and throat, thirst and polydipsia, sensation of drowsiness and heaviness in the head, shortness of breath, sensation of fullness in epigastrium, dyspnea, sensation of oppression and fullness in the chest, less sleep, sticky stool, insomnia with easiness to be wakened, red tongue, white tongue coating, thready pulse, weak in pulsation, deep pulse.(4) Deficiency of kidney and stagnancy of yang:timid and propensity to be frightened, dysphoria, amnesia, fatigue, intolerance of coldness, sensation of drowsiness and heaviness in the head, lassitude in the loins and knees, feeling heavy in the limbs, dreaminess, easy to be wakened, easy to fall asleep and easy to be wakened, light red tongue, white tongue coating, deep pulse, forceful in pulsation.Conclusion1 Based on collection and review of recent literatures regarding major data analysis methods used in syndrome study.(1) There is a tendency of conducting unsupervised data analysis based on large volume of sample data collected from clinical epidemiological survey. Unsupervised data analysis methods such as cluster analysis and factor analysis, are two of the most frequently used methods in recent syndrome study. However, these two methods still have problems such as data types, application methods and etc. Therefore, overall assessment is carried out to evaluate every single data analysis methodology.(2) Based on the anlaysis of recent literatures regarding factor analysis used in syndrome study, to comment on current applicability status from four different perspectives including variable matrix, number of factors, factor extraction methods and factor rotation methods, and as well to raise technical issues as below:1. Are categorical variables of TCM symptoms applicable in syndrome diagnosis? Is simplified correlation coefficient applicable in symptom study? 2. Is it appropriate to determine number of factors with eigenvalue≥ 1? 3. Is it appropriate to determine factor exaction in the matrix by means of minimal factor number? Does it comply with TCM diagnosis theory? 4. Oblique rotation is frequently used in recent literatures and it assumed no relationship existed between factors. Does it comply with the requirement of TCM syndrome study? Factor analysis is considered not applicable in syndrome study if symptom variable does not comply with the assumptions of factor analysis, or the testing precondition of data variables is not ideal or it does not comply with the requirements of factor analysis method.(3) Based on the anlaysis of recent literatures regarding cluster analysis used in syndrome study, technical issues are raised as below:1. The statistical cluster used by systematic cluster analysis method is based on the distance parameters and correlation coefficient, which belong to continuous variables that does not comply with the requirement of TCM syndrome categorization.2. There are many different methods to calculate distance parameters and as well generate different outcomes. Therefore, it is difficult to determine which method is to be used in syndrome study.3. Cluster anlaysis used cluster method as it major process. Therefore, it does not cater the relationships between symptoms which does not comply with basic theories of diagnosis in TCM. As such, cluster analysis is considered not applicable in syndrome study.2 Based on the analysis of manifest variables in latent class model fordepression as target research.(1) Using unsupervised latent class analysis method, seven groups of depression symptoms are further categorized with its associated manifest variables. As a result, all manifest variables can be interpreted in accordance with basic theories of diagnosis in TCM.(2) Using unsupervised latent class analysis method, data extracts of depression symptoms associated with its combined manifest variables are finally classified into four groups(latent variable groups) as:deficiency of liver and spleen, stagnancy of the liver-fire, stomach invasion of liver-fire, deficiency of kidney and stagnancy of yang.(3) By analyzing the prospects of latent class analysis on unsupervised data analysis of depression as target research, it found that classification of latent class analysis syndromes and TCM syndrome are reasonably similar. As such, the final outcome of latent class analysis symptoms is in accordance with basic theories of diagnosis in TCM and as well as basic principles of statistics.
Keywords/Search Tags:syndrome, depression, data analysis, latent class analysis
PDF Full Text Request
Related items