Font Size: a A A

Research On Medical Report Generation Based On Transformer

Posted on:2024-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:W JiangFull Text:PDF
GTID:2544307103974729Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Medical report generation transforms visual features of medical images into text features through computer calculation and analysis.The challenges of the current medical report generation task are mainly in two aspects: 1)difficulty in modeling complex medical knowledge.The medical domain has a vast and intricate knowledge base that is difficult to comprehend.A qualified physician typically requires at least seven years of systematic learning and significant clinical experience.This difficulty manifests in medical report generation tasks as long and complex sequence problems,difficulty in extracting medical imaging features,and inaccurate descriptions of lesions.2)The long-tail problem,refers to the fact that a large amount of data is concentrated in a few types,while the data volume of most other types is far less than those few types.In medical report generation tasks,this problem is manifested as a large number of healthy instances compared to instances with lesions,with a few types of lesions having significantly more instances than others.The above two issues have seriously affected the performance of medical report generation tasks.To address these challenges,two approaches are proposed in this thesis as follows:For the first problem,this article proposes a medical report generation method based on a cross-modal contrastive learning framework.This method proposes a new cross-modal contrastive learning framework and utilizes text features to guide the extraction of visual features,thereby solving the problem of difficult feature extraction from images.This method also proposes a shared text encoder module,which improves Transformer and LSTM and combines their advantages to design a novel text decoder,solving the problem of long and complex sequence reports.In addition,this method proposes a shared expression dictionary,which enhances and records the cross-modal alignment between the report generation process and the report reconstruction process,reconstructs and corrects the input features,removes noise information contained in different modalities,and obtains more robust feature expressions,thus solving the problem of inaccurate lesion description.For the first problem and the second problem,this article proposes a medical report generation fusion of knowledge graph and diagnostics.Based on the previous approach,this method makes a lightweight design of the model to reduce the model parameter volume and improve the model efficiency.Meanwhile,a new text decoder is proposed.This method also proposes a new way to construct the knowledge graph,which classifies instances in the medical report generation task through the knowledge graph,provides a basic condition for the introduction of diagnostic information,and establishes a new loss function.In addition,this method proposes a new framework that integrates diagnostic information,using a multi-label classification network as the visual encoder and inputting the predicted diagnostic information into the text decoder module,solving the problems of difficult feature extraction from images,long-tailed problems,and inaccurate lesion description.The method proposed in this thesis experiments on two publicly available medical imaging datasets,IU X-ray and MIMIC-CXR.The experimental results verify that the method proposed in this thesis can effectively improve the accuracy of the model-generated reports,increasing the BLUE-4 scores of the two datasets to0.203 and 0.115,respectively.
Keywords/Search Tags:Medical Report Generation, Cross-modal Contrastive Learning, Knowledge Graph, Multi-label Classification
PDF Full Text Request
Related items