Font Size: a A A

Research On Event Extraction And Interpretability Based On Tumor Disease Inspection Report

Posted on:2022-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:X L ChenFull Text:PDF
GTID:2504306779464144Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Malignant tumor has become one of the diseases that seriously endanger human health.A large number of inspection reports are generated during the treatment of tumor disease.Medical text data such as X-ray imaging reports,ultrasound imaging reports,and CT inspection reports are the doctor’s objective record of the disease and the important basis for disease diagnosis.How to extract the content that doctors are interested in from massive reports and present them in a structured form is particularly important to assist doctors in diagnosis and treatment.In response to the above problems,this article takes the tumor disease inspection report into research,proposes an event extraction model based on machine reading comprehension and conducts an interpretability study on this model.The main contents include:(1)Generating machine reading comprehension dataset based on tumor disease inspection reportPreprocess the tumor disease inspection report to generate a dataset that can be used in machine reading comprehension methods.First,perform data cleaning on the unlabeled data,irrelevant tags,and garbled problems in the inspection reports,and then generate the dataset.In the process of the dataset generation,first,the inspection report is regarded as the passage,and then questions are generated based on the attributes marked in the inspection reports,the attribute values are marked as answers with the annotation algorithm.At last,the dataset Tumor QA is generated.Finally,with text reorganization and Chinese-English back translation methods,the enhancement dataset Tumor QA+ is generated.(2)Proposing a method for tumor disease event extraction based on machine reading comprehensionThe tumor disease event extraction is defined as a machine reading comprehension task.Based on BERT,a question answerability discriminator Judger is constructed to determine whether questions can be answered.This discriminator is mainly used to assist the span selection model to get answers.In the span selection model,the encoding of each word in medical text is extracted from the last transformer layer of BERT,and then important information related to answers is filtered out from the encoding by attention mechanism.The information is inputted into bidirectional LSTM for obtaining the global representation of the text.Finally,the global representation is used in softmax function to select the answer span to implement event extraction.In addition,combined with Judger,a new loss function is designed for the span selection model,which avoids the answer extraction of unanswerable questions and improves the efficiency of the training process.(3)Research on the interpretability of the event extraction modelPut interpretability analysis on the model before and after modeling.Before modeling,analyze the research data to get the overall distribution characteristics of the dataset,explain the relationship between the model prediction and the dataset distribution,and find that the prediction results will be affected by the input data when the dataset is small.After modeling,the process of information transmission between network layers can be seen with the hidden layer analysis method.Specifically,calculate the integrated gradient of the data in the model and the Shannon divergence between each layer in the model,the influence of each network on the model decision is explained.
Keywords/Search Tags:medical text, event extraction, machine reading comprehension, question answerability, interpretability
PDF Full Text Request
Related items