With the aid of our national policy,a large amount of old Chinese medicine experts’experience in clinical disease diagnosis and treatment has been preserved through the collation and conclusion,and many literatures and writings on the experience of Chinese medicine experts in disease diagnosis and treatment have been published recently.The mass production of digital documents in the field of acupuncture and moxibustion has made the traditional method that relies on human resources to obtain acupuncture and moxibustion knowledge no longer feasible.In this context,it is both practical and theoretically significant using natural language processing technology to automatically obtain the specialized fields information from a large number of unstructured Chinese medicine literature,such as terminology,entity relations,events and so on.This paper studies the technology of automatic extraction of TCM acupuncture and moxibustion information in accord with the charcteristics of its texts,and mainly completes the following work:(1)A domain term extraction algorithm model based on seed set is established in accordance with the characteristics of the terms of TCM acupuncture and moxibustion.This model firstly iterates a limited set of seed sets,and generates component sets in the field of TCM acupuncture and moxibustion;then uses the term component set as the domain dictionary,and takes the maximum forward matching algorithm to segment the sentences in the TCM acupuncture medical literature and extracts the candidate terms;finally,makes use of the linguistic rules to filter the candidate terms and select the terms in the field of TCM acupuncture and moxibustion.This paper uses keyword sets as seed sets in the experiment,in which the F-measures of the term extraction open tests are 77.29%.(2)We select the effective vocabulary,grammar and semantic features to build a feature template,and vectorize the instances of entity relations according to the context of the entity relations in the field of TCM acupuncture and moxibustion.We also adopt machine learning method of support vector machine to train entity relation classification model in TCM acupuncture and moxibustion.The experimental results show that this model has a good effect on the extraction of the entity relations in the field of acupuncture and moxibustion.The F-measures of the relation classification models of DM,HM and DRM are 93.25%,87.19%and 84.57%respectively.(3)This paper collects the manually-labeled event trigger words from the training corpus,constructs TCM acupuncture event trigger words list,extends the list with the Tongyici Cilin,and recognizes the Chinese acupuncture candidate event trigger words based on the extended trigger word list.In accordance with the characteristics of the word expression in TCM acupuncture field,we set the event candidate trigger filter rules,then construct Chinese acupuncture event trigger word recognition model of dictionary matching and rule filtering.The results show that the model has a good performance of trigger word recognition,and the F-measures of treatment events trigger words recognition are 88.28%.In order to achieve the standardized management and storage of information in the field of TCM acupuncture and moxibustion,we apply the information extraction research results to the construction of knowledge base in the field of TCM acupuncture and moxibustion,and provide a unified big data foundation platform for the specific application of TCM acupuncture and moxibustion assisted teaching,assisted diagnosis and treatment,and research of knowledge discovery. |