Font Size: a A A

The Research On DataMining In Disease Records Of Several Diseases Of Gynecology In Traditional Chinese Medicine

Posted on:2007-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y B LiFull Text:PDF
GTID:2144360182993093Subject:Information Science
Abstract/Summary:PDF Full Text Request
The history of Traditional Chinese medicine(TCM) is more than thousands of years. The data in this field has been accumulated in a massy amount and increasing in high speed. Facing this huge treasure of TCM, many researchers of TCM Information are endeavoring to abstract more valuable information deeply and quickly by using various methods, In this paper, the data mining technology is introduced to make deep analysis in disease record, and several concreted examples are given with results as well.1 BackgroundAssociation Analysis, which is one technology of data mining, has been used in TCM Information study near 4-5years. Although some fruits have been achieved, there are still some unsatisfied aspects as following in those researches.In past research, there is lack of standardization on original data. It makes the scale of database limited. So the effect of using data mining technology in TCM information is also limited.The original data which is used in research is only from disease record and formula. The result is almost restricted in Chinese medicinal herb compatibility.At the same time, the evaluation of result is presented without profound analyses mostly, which makes the result difficult to be understood if people lack of data mining knowledge.2 Research contentThe procedure of research is addressing target, collecting data, formulizing the data as preprocessing, processing data, evaluating and analyzing result.TargetThe research uses one important technology of data mining which is Association Analysis in several diseases record of gynecology. The focus is on prescription, symptom, and syndrome.Data collectionAfter investigation of journal article about gynecology in TCM, 4 kinds of diseases are chosen in that relevant article number is in top rank. They are uterine bleeding, Amenorrhea, Infertility, Dysmenorrheal.The authors of these disease records are famous TCM experts born in 1920'-1930'. And the records are made in 1972-2005 as published in journals and books,The collection of records also follows the rules as:The prescription, symptom, syndrome and diagnosis must be included in records.The diagnosis of disease is referred to industry standardization on TCMThe prescription is only oral herbal medicine.The syndrome must contain location and property of disease.The number of symptoms in records must be more than one.The final collection result is 2138 disease records which include 664 uterine bleeding records, 408 Amenorrhea records, 631 Infertility records and 476 Dysmenorrhea records.Data preprocessingChinese medicinal herb name records are not uniform. There are many situations inname record------alias name, incorrect character, and abbreviation. The main work ofthis part is to revise them into standard names. The standard name is referring to authority Chinese medicinal herb book.The processing of Chinese medicinal herb has obvious effect on Therapeutic Properties, so the processing is marked behind the name.In disease record, the patterns of expression on syndrome are various. The first step is to make them uniform in structure. The standard syndrome is a subject-predicate phrase. The part of subject is location of disease and the part of predicate is property of disease. The second step, the number of the various synonyms of syndrome term is calculated. The most frequent used term is the standard syndrome.In the same way as the syndrome, the symptom in record is revised in subject-predicate phrase. The most frequent used term is the ultimate standard symptom.After formulizing the data, a database is constructed. There are 2138 disease records in this database which is comprised of four gynecological diseases. The four fields of every record are symptom, syndrome, prescription, diagnosis. There are 605 kinds of symptom, 63 kinds of syndromes and 754 kinds of Chinese medicinal herbs in the database.Data processingDiffering from the past research, this research not only makes data mining in field of prescription, symptom and syndrome separately, but also makes data mining among the three fields . The result of research is high frequent item set such as"medicine+medicine", "symptom+symptom", "syndrome+syndrome" and as wellas "medicine+symptom", "medicine+syndrome", "symptom+syndrme".Support and confidence are made for describe the data mining result. There are not fix valuation in support and confidence. In this research, the support 5% and the confidence 40% are made first . After data mining, the first result is ranked in support and confidence individually, the top ten items in the result are analyzed finally.The data mining is utilized a procedure ------WEKA developed by Dr. ZhouXuezhong in China Academy of Traditional Chinese medicine.Result analysesThe number of result table is 15,including 5 tables of "Chinese medicinal herb",5 tables of "Chinese medicinal herb and symptom", 5 tables of "Chinese medicinal herb and syndrome".With the reference of classic theory on TCM, the most result of data mining is found in them. The coincident rate in result of Chinese medicinal herb" is 70%, the result of "Chinese medicinal herb and symptom" is 39%, the result of "Chinese medicinal herb and syndrome" is 100%.Other than the result which is accord with classic theory, the remained result is analyzed deeply as well. In this research, clinic experience is inducted to analysis.In this paper researched method is also compared with statistic method. The result in "Chinese medicinal herb" is compared with the result of frequency statistic. The main difference between them is that data mining is expressed a relationship between two kinds of Chinese medicinal herbs, the statistic only can calculate single Chinese medicinal herb frequency.3 ConclusionThe utilization of data mining in TCM information is to analyses information from a new view;it can be a complement method to get knowledge. It is worth to make further research.Through the data mining, the fresh knowledge excluding in classic theory is gained.In this research, the original data —Chinese medicinal herb, symptom, syndrome are standardized. The systemic procedure is made. This part of work is helpful for TCM information standardization.The result is including three kind of high frequent item set. They are"Chinese medicinal herb","Chinese medicinal herb and symptom" and "Chinese medicinal herb and syndrome" . These result can provide reference to clinic and is quite benefit for prescription.4 ProspectThe procedure of data mining can be made as a mature tool, the scholar can be familiar with the tool like a traditional statistical tool. The field of TCM information is not limited in formula and disease record, It can be expanded to broader field.The basic work of term standards is strongly demanded by constructing bigger and effective database. Which will benefit the processing more scientifically, promote the data accumulate respectively.In the era of information explosion, Traditional Chinese medicine is needed to be developed by more modern technology. In the field of TCM information, TCM will be combined with computer technology. The new view and tools to analyses the TCM information will bring new knowledge and explore new way to promote the TCM.
Keywords/Search Tags:Traditional Chinese medicine, Gynecology, Disease record, Data mining, Knowledge discovery in databa
PDF Full Text Request
Related items