Font Size: a A A

A Study On Medical Disease Phenotype Entities And The Relationship Extraction Methods

Posted on:2020-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:N YuanFull Text:PDF
GTID:2404330575995202Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the application of information and digital technology,a large amount of digital data has been formed in the medical field.However,most medical knowledge and data are still existed in unstructured texts,such as clinical electronic medical records,ancient Chinese books,such as the Yellow Emperor’s internal medicine,the theory of typhoid fever,the Compendium of Materia Medica,and other modem medical literature.Extracting structured information from these large-scale text information is a prerequisite for in-depth medical analysis and utilization,also one of the main bottlenecks in medical data mining.The text combines the extraction of phenotypic entities and their relationships,and manually standardizes the clinical medical records,TCM ancient books and PubMed bibliographic data.We constructs the data extraction standard dataset,and then studies the phenotypic named entity recognition and its relationship extraction method.The main research work includes the following three aspects:Firstly,10426 biopsy phenotypic entity identification standard data sets were constructed.The Conditional Random Field(CRF)and Structured Support Vector Machine(SSVM)were used for entity extraction.The traditional features and depth-based representations were compared and analyzed.We found the performance is differences between word feature learning methods(Word2Vec and Node2Vec).The experimental analysis shows that the F1 value of the traditional feature-based CRF method is 0.83,while the FI values of the CRF and SSVM methods based on the Word2Vec word vector reach 0.9798 and 0.9908,respectively.At the same time,the FI values based on the Node2Vec word vector reach 0.8879 and 0.9413,respectively.The FI values of the vectors reached 0.9752 and 0.9788,respectively.It can be seen that the performance based on the depth representation is superior to the traditional feature named entity recognition algorithm,which basically reaches the practical level(FI value>0.95),and the SSVM is superior to the CRF method in performance.At the same time,the SSVM based on the Node2Vec deep word feature also achieves good performance because no word segmentation is required.Secondly,taking the phenotypic relationship in the English bibliographic literature as the goal,a standard data set(8991 sample records)from PubMed containing four kinds of relationships is constructed,based on the word features and sentence features,respectively.The Convolutional Neural Networks(CNN)and the multi-Convolution Kernels CNN(CNNs)are used for relationship extraction research.The experiment found that the FI value of CNN with fusion word features and sentence features reached 0.7494,while the CNN method FI value was 0.8039.Compared with word-based CNN(FI value 0.7031),it increased by 4.63%and 5.45%,respectively.Thirdly,a staldard dataset of ancient books containing 10 kinds of relationship types(81,908 sample data)was constructed,and the BiGRU algorithm was combined with the Attention mechanism and the BiLSTM algorithm for relationship extraction research.The experimental results show that the F1 value of the BiGRU+Attention algorithm reaches 0.9486,the FI values of the BiLSTM algorithm on the WF feature and the WF+PF feature are 0.9017 and 0.9232,respectively.It can be seen that the performance of the BiGRU algorithm is better than that of the BiLSTM algorithm.
Keywords/Search Tags:Named Entity Recognition, Relationship Extraction, Phenotype Entity, Deep Neural Network, Text Mining
PDF Full Text Request
Related items