Research And Application Of Structuring Lung Cancer Diagnosis Text Based On Machine Reading Comprehension

Posted on:2024-07-26

Degree:Master

Type:Thesis

Country:China

Candidate:Q Wu

Full Text:PDF

GTID:2544307076992929

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The pathological diagnosis recorded in electronic medical records is an important source of information in the medical field.Pathological diagnosis is usually unstructured textual descriptions and diagnostic conclusions,it’s difficult to mine important information from the unstructured text.The structured representation of pathological diagnosis text is of great significance for assisting clinical decision-making,disease prevention,and early diagnosis.However,the task of structuring lung cancer diagnostic text still faces two challenges:(1)there are many types of attributes to extract,and the data has less descriptive information for each type of attribute,making it difficult to extract all attributes,(2)the model cannot adapt to new attribute extraction tasks that may arise.The main contents of this paper include:(1)We propose a multi-task attribute extraction model based on machine reading comprehension to achieve the task of extracting attributes from lung cancer diagnosis text.First,to solve the problem of insufficient training data,the model’s training data is constructed by concatenating the question with the diagnostic text.By concatenating different categories of attribute question and lung cancer diagnosis text,the training data was expanded.Secondly,to solve the problem of model errors in extracting answer boundaries,candidate words corresponding to the answer were added to the question to provide the model with more prompt information about the answer boundary,allowing model to more fully consider the semantic context of the question and text,and more accurately determine the boundary position of the answer.Finally,to improve the overall attribute extraction effect of the model,the idea of multitask learning is adopted,designing appropriate auxiliary tasks and main tasks,and training them at the same time,sharing the underlying model encoding information between tasks,and using auxiliary tasks to constrain the main task’s weight,which leads to significant improvements in attribute extraction compared to the single-task machine reading comprehension attribute extraction model.(2)We propose a machine reading comprehension-based continual learning attribute extraction model,which achieves incremental updates to the model using a small amount of labeled data for new attributes.Due to data security concerns in the medical field,annotated data should not be further annotated for new attributes.Full labeling of new data is time-consuming,and as the amount of labeled data increases,labeling errors will accumulate,which is not conducive to subsequent model training.To address the problem of the model’s inability to adapt to new attribute extraction task while avoiding full labeling of data,this paper introduces the concept of continual learning to traditional machine reading comprehension models.Only the new attributes that appear in the data need to be labeled,and then the model can be updated by training on that data.For old attribute extraction knowledge,the distilled knowledge from a teacher model that has learned this knowledge is transferred to a student model to consolidate the old attribute extraction knowledge and combat catastrophic forgetting.For the new labeled attributes,the student model learns directly from the labels.This approach enables the student model to acquire knowledge about both old and new attributes and solves the issue of the inability of machine reading comprehension models to incrementally update.(3)Based on the model proposed above,we designed and implemented a prototype system for structuring lung cancer diagnosis text using a B/S architecture.The system can extract specified attribute values from unstructured lung cancer diagnosis reports and combine them with the self-defined rules of the cancer diagnosis data to form structured pathology diagnosis data.Furthermore,the system introduces the idea of continual learning,which supports the internal updating of pre-set models to adapt to new attribute extraction requirements.

Keywords/Search Tags:

Span Extraction Reading Comprehension Model, Multi-task Learning, Continual Learning, Knowledge Distillation

PDF Full Text Request

Related items

1	Research And Application Of Medical Entity Extraction Based On Multi-task Learning And Transfer Learning
2	Multi-task Learning For Automated Quality Control Of Fetal Head Ultrasound Images
3	Drug Discovery On Large-Scale Knowledge Graphs Based On Multi-Task Learning
4	Research Of DDH Multi-target Detection Algorithm Based On Deep Learning
5	The Research Of Computer-aided Diagnosis In Chest Images Based On Multi-semantic Task And Multi-label Incremental Learning
6	Study On The Phenotypic Extraction Method Of Clinical Records Based On Multi-task Learning
7	Research On The Recommendation Algorithm Of Knowledge Graph And Multi-modal Knowledge Embedding In Telemedicine
8	Multi-task Model Based On Cortical Morphology Of The Cortex And Its Classification Research
9	Lung Segmentation On Clinical CT Images Based On Multi-Task Learning
10	Ultrasound Diagnosis Model Of Benign And Malignant Thyroid Nodules Based On Multi-Task Deep Learning