Research On Medical Visual Question Answering Based On Sentence Structured Mapping And Biomedical Language Model

Posted on:2023-08-06

Degree:Master

Type:Thesis

Country:China

Candidate:Q Xiao

Full Text:PDF

GTID:2530306614972639

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Visual question answering task is a relatively new research field,which is different from traditional visual tasks such as image classification,target detection and semantic segmentation.Visual question answering takes an image and any form of question about the image as input and outputs a correct answer related to the question.Therefore,visual question answering is a task that combines computer vision,natural language processing and multimodal feature fusion.In the medical field,the "second opinion" provided by some automated auxiliary systems can enhance clinicians’ confidence in interpreting complex medical images.Therefore,as a new type of digital intelligent medical equipment,visual Q & a technology in the medical field has great market space and potential.This thesis studies the visual question answering system on medical images,and makes in-depth research,improvement and innovation on the core components of the visual question answering system: image feature extraction and text feature extraction,multimodal feature fusion,answer prediction and so on.The research contents of this thesis are as follows:Firstly,this thesis proposes VGBM model(biobert medical visual question answering model based on biomedical corpus).The main idea of VGBM model is to extract image features by using part of the middle layer output of pre trained vgg16 Network + global average pool,and text features are extracted by biobert model pre trained on medical text corpus,Then the two extracted features are fused by collaborative attention mechanism(MFH),and finally a classification layer is input to predict the answer.Secondly,the VGBM model is further improved by using the method based on sentence structure and image attention mechanism,the channel attention mechanism is introduced in the extraction of image features,and the sentence structure mapping method is used in text processing,which further optimizes the model and further improves the performance of the model.Finally,the proposed method is tested on three medical visual question and answer data sets(Image CLEF 2019 VQA Med,Image CLEF 2020 VQA Med and VQA-RAD).The experimental results show that the model in this thesis has achieved good experimental results on the three data sets.

Keywords/Search Tags:

VQA, VQA-Med, Global average pooling, Attention mechanism, Sentence structure mapping

PDF Full Text Request

Related items

1	The Research On Colonoscopy Image Classification Method Based On Deep Convolutional Neural Network
2	Research And Application Of Graph Double Attention Network Based On Maximum Pooling
3	Network Representation Learning Algorithm Research Based On Attention Mechanism
4	Research On Independent-Subject Emotion Recognition Of EEG Based On Global Spatial Structure Attention
5	Conditions On Petroleum Pooling In Wendong Slope Belt
6	Research On Medical Image Segmentation Algorithm Based On Transformer And Comprehensive Attention Mechanism
7	Research On Medical Image Segmentation Algorithm Based On Encoder-decoder Structure And Attention Mechanism
8	Research On Prediction Method Of RNA Secondary Structure With Pseudoknots Based On Attention Mechanism
9	Road Extraction From High Resolution Remote Sensing Images Based On Dual Spatial Attention Mechanism
10	Self-attentive Moving Average For Time Series Prediction