Font Size: a A A

Research And Application Of Structuring Lung Cancer Diagnosis Text Based On Machine Reading Comprehension

Posted on:2024-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q WuFull Text:PDF
GTID:2544307076992929Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The pathological diagnosis recorded in electronic medical records is an important source of information in the medical field.Pathological diagnosis is usually unstructured textual descriptions and diagnostic conclusions,it’s difficult to mine important information from the unstructured text.The structured representation of pathological diagnosis text is of great significance for assisting clinical decision-making,disease prevention,and early diagnosis.However,the task of structuring lung cancer diagnostic text still faces two challenges:(1)there are many types of attributes to extract,and the data has less descriptive information for each type of attribute,making it difficult to extract all attributes,(2)the model cannot adapt to new attribute extraction tasks that may arise.The main contents of this paper include:(1)We propose a multi-task attribute extraction model based on machine reading comprehension to achieve the task of extracting attributes from lung cancer diagnosis text.First,to solve the problem of insufficient training data,the model’s training data is constructed by concatenating the question with the diagnostic text.By concatenating different categories of attribute question and lung cancer diagnosis text,the training data was expanded.Secondly,to solve the problem of model errors in extracting answer boundaries,candidate words corresponding to the answer were added to the question to provide the model with more prompt information about the answer boundary,allowing model to more fully consider the semantic context of the question and text,and more accurately determine the boundary position of the answer.Finally,to improve the overall attribute extraction effect of the model,the idea of multitask learning is adopted,designing appropriate auxiliary tasks and main tasks,and training them at the same time,sharing the underlying model encoding information between tasks,and using auxiliary tasks to constrain the main task’s weight,which leads to significant improvements in attribute extraction compared to the single-task machine reading comprehension attribute extraction model.(2)We propose a machine reading comprehension-based continual learning attribute extraction model,which achieves incremental updates to the model using a small amount of labeled data for new attributes.Due to data security concerns in the medical field,annotated data should not be further annotated for new attributes.Full labeling of new data is time-consuming,and as the amount of labeled data increases,labeling errors will accumulate,which is not conducive to subsequent model training.To address the problem of the model’s inability to adapt to new attribute extraction task while avoiding full labeling of data,this paper introduces the concept of continual learning to traditional machine reading comprehension models.Only the new attributes that appear in the data need to be labeled,and then the model can be updated by training on that data.For old attribute extraction knowledge,the distilled knowledge from a teacher model that has learned this knowledge is transferred to a student model to consolidate the old attribute extraction knowledge and combat catastrophic forgetting.For the new labeled attributes,the student model learns directly from the labels.This approach enables the student model to acquire knowledge about both old and new attributes and solves the issue of the inability of machine reading comprehension models to incrementally update.(3)Based on the model proposed above,we designed and implemented a prototype system for structuring lung cancer diagnosis text using a B/S architecture.The system can extract specified attribute values from unstructured lung cancer diagnosis reports and combine them with the self-defined rules of the cancer diagnosis data to form structured pathology diagnosis data.Furthermore,the system introduces the idea of continual learning,which supports the internal updating of pre-set models to adapt to new attribute extraction requirements.
Keywords/Search Tags:Span Extraction Reading Comprehension Model, Multi-task Learning, Continual Learning, Knowledge Distillation
PDF Full Text Request
Related items