A Study On Medical Disease Phenotype Entities And The Relationship Extraction Methods

Posted on:2020-07-05

Degree:Master

Type:Thesis

Country:China

Candidate:N Yuan

Full Text:PDF

GTID:2404330575995202

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the application of information and digital technology,a large amount of digital data has been formed in the medical field.However,most medical knowledge and data are still existed in unstructured texts,such as clinical electronic medical records,ancient Chinese books,such as the Yellow Emperor’s internal medicine,the theory of typhoid fever,the Compendium of Materia Medica,and other modem medical literature.Extracting structured information from these large-scale text information is a prerequisite for in-depth medical analysis and utilization,also one of the main bottlenecks in medical data mining.The text combines the extraction of phenotypic entities and their relationships,and manually standardizes the clinical medical records,TCM ancient books and PubMed bibliographic data.We constructs the data extraction standard dataset,and then studies the phenotypic named entity recognition and its relationship extraction method.The main research work includes the following three aspects:Firstly,10426 biopsy phenotypic entity identification standard data sets were constructed.The Conditional Random Field(CRF)and Structured Support Vector Machine(SSVM)were used for entity extraction.The traditional features and depth-based representations were compared and analyzed.We found the performance is differences between word feature learning methods(Word2Vec and Node2Vec).The experimental analysis shows that the F1 value of the traditional feature-based CRF method is 0.83,while the FI values of the CRF and SSVM methods based on the Word2Vec word vector reach 0.9798 and 0.9908,respectively.At the same time,the FI values based on the Node2Vec word vector reach 0.8879 and 0.9413,respectively.The FI values of the vectors reached 0.9752 and 0.9788,respectively.It can be seen that the performance based on the depth representation is superior to the traditional feature named entity recognition algorithm,which basically reaches the practical level(FI value>0.95),and the SSVM is superior to the CRF method in performance.At the same time,the SSVM based on the Node2Vec deep word feature also achieves good performance because no word segmentation is required.Secondly,taking the phenotypic relationship in the English bibliographic literature as the goal,a standard data set(8991 sample records)from PubMed containing four kinds of relationships is constructed,based on the word features and sentence features,respectively.The Convolutional Neural Networks(CNN)and the multi-Convolution Kernels CNN(CNNs)are used for relationship extraction research.The experiment found that the FI value of CNN with fusion word features and sentence features reached 0.7494,while the CNN method FI value was 0.8039.Compared with word-based CNN(FI value 0.7031),it increased by 4.63%and 5.45%,respectively.Thirdly,a staldard dataset of ancient books containing 10 kinds of relationship types(81,908 sample data)was constructed,and the BiGRU algorithm was combined with the Attention mechanism and the BiLSTM algorithm for relationship extraction research.The experimental results show that the F1 value of the BiGRU+Attention algorithm reaches 0.9486,the FI values of the BiLSTM algorithm on the WF feature and the WF+PF feature are 0.9017 and 0.9232,respectively.It can be seen that the performance of the BiGRU algorithm is better than that of the BiLSTM algorithm.

Keywords/Search Tags:

Named Entity Recognition, Relationship Extraction, Phenotype Entity, Deep Neural Network, Text Mining

PDF Full Text Request

Related items

1	Research On Named Entity Recognition And Entity Relationship Extraction Of Medical Data Text Based On Attention
2	Biomedical Named Entity Recognition Based On Local Feature Enhancement
3	Chinese Medical Text Entity Recognition Based On Deep Neural Network
4	Research On Named Entity Recognition And Normalization From Biomedical Text
5	Medical Text Information Extraction Based On Deep Learning
6	Named Entity Recognition In Medical Texts Based On Recurrent Neural Network
7	GAN-based Named Entity Recognition For TCM Text
8	Research And Realization Of Medical Case Automatic Generation Based On Named Entity Recognition
9	Research On Biomedical Named Entity Recognition And Relation Extraction Based On Neural Network
10	Named Entity Recognition In Medical Field Based On Deep Learning Of Chinese