Font Size: a A A

Research On Phenotype Concept Spectrum Extraction From Clinical Records Of Traditional Chinese Medicine

Posted on:2021-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Q G ZhengFull Text:PDF
GTID:2404330614971013Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The clinical medical record is text information generated in medical activities.It is a record of the occurrence,development,examination,diagnosis,treatment and other medical activities of the patient’s disease,and is usually written by medical personnel.The clinical medical records contain a wealth of information,and the mining of this information provides important data for various clinical studies.Phenotype spectrum is characteristic mentioned in the patient’s medical records.The extraction of clinical medical record phenotype spectrum information is usually performed using named entity recognition method.Due to the ambiguity of boundary of phenotype mentions and the fact that phenotype spectrum entity is not standardized,the result of phenotype spectrum extraction is difficult to be directly applied to various downstream tasks.The phenotype concept spectrum is a conceptual phenotype spectrum.The extraction of the phenotype concept spectrum of the clinical medical record will directly identify the phenotype spectrum corresponding to the predefined concept set from the medical record text.In this paper,the extraction method of TCM clinical text phenotype concept spectrum is studied.Firstly,we developed “Human-Machine Collaborative Phenotype Spectrum Annotating System” for annotating the phenotype spectrum of medical records efficiently.After that,a named entity recognition method based on active learning that can be used in sample recommendation of the annotating system is proposed.Finally,two automatic extraction methods of phenotype concept spectrum are proposed.The research includes the following three parts:(1)Development of a "Human-Machine Collaborative Phenotype Spectrum Annotating System" for rapid structuring of clinical medical texts.In order to reduce redundant manual labeling workload,the annotating system cooperates with manual labeling and automatic labeling.The system preferentially recommends samples with higher labeling value for manual labeling,and automatically labels entities that are easy to identify.This paper presents the design and development of the annotating system.(2)A named entity recognition method based on active learning that can be used in Human-Machine Collaborative Phenotype Spectrum Annotating System is proposed,and an improved query strategy for active learning is proposed.In this paper,we tested the performance of entity recognition and sample recommendation strategies through experiments,and conducted performance comparison experiments on several existingquery strategies.Experiments show that using an active learning named entity recognition model based on uncertainty requires only about 35% of the labeled samples,which can achieve 99% of the best performance.While the random strategy can only reach 88% with same amount of labeled samples.(3)Two methods are proposed to extract the phenotype concept spectrum of clinical medical records.1)We implemented a pipeline method to extract the phenotype concept spectrum.A BERT + CRF based deep learning model is used to extract the phenotype spectrum of the medical record text,and then the phenotype spectrum is mapped to a pre-defined concept set based on rules.The performance of the phenotype spectrum extraction model was verified,and Micro F value on the CCKS2019 Chinese electronic medical record medical entity recognition standard data set reached 0.8309.2)A multi-label text classification approch to extract phenotype concept spectrum is implemented.A composite document-level representation D2V-TFIDF is constructed by concatenating the Doc2 Vec of the dense semantic representations and the TFIDF of the sparse representations.One Vs Rest multi-label classification based on SVM is performed based on the D2V-TFIDF feature of the medical record sample.Experiments show that the multi-label text classification method based on D2V-TFIDF representation has a Micro F-measure value of 0.63 on Chinese clinical medical record text,which is higher than the D2 V representation(0.18)and TFIDF representation(0.61).
Keywords/Search Tags:clinical records, phenotype spectrum, concept extraction, human-machine collaboration, active learning
PDF Full Text Request
Related items