Font Size: a A A

Information Extraction Of Patients’ Self-Description Aiming At Online Diagnosis

Posted on:2016-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:L Q NingFull Text:PDF
GTID:2285330467490748Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Online diagnosis has become a gradually popular medical service that keeps patients indoors. However, there are only a limited number of doctors to afford this. To solve this problem, this paper aims at building an effective system to extract useful information from patients’ self-description, using information extraction technology based on pattern-match. The self-description texts of patients with digestive diseases are involved in this study.This study consists of three parts as follows:(1)Tolerance processing algorithm of online texts:For the detection and correction of errors from patients’ description, the typographical errors are classified as six types showing different features. They can be divided into global features and partial features. Three parameters based on corpus methods are introduced to reflect these features:Similarity function and Levenshtein distance, N-gram probability difference, MI difference. The algorithm proves to be effective on test set.(2)TCM syndrome knowledge base:Knowledge base is intended for the guidance of information extraction. It is constructed according to framework theory and TCM textbooks. There are28digestive syndromes in the ultimate knowledge base.(3)Artificial pattern achievement and extension:This part mainly deals with the typical patterns included in the texts. Corpus methods for construction study is adopted to achieve the patterns and HIT-CIR Tongyici Cilin (Extended) is used to extend them. The useful information is extracted after pattern-match for later comparison with that of the knowledge base. The ultimate diagnosis is made by the system automatically after comparison. The567patterns processed in this way have proved effective on the test set, thus aid perfectly in the diagnosis by system.This study is based on some linguistic theories and corpus methods, resulting in a comparatively effective information extraction system to enhance the automation of online diagnosis. There are also some discussions on several influential factors concerning system performance which entail further studies.
Keywords/Search Tags:Information Extraction, Pattern, Corpus, Traditional Chinese Medicine
PDF Full Text Request
Related items