| Visual question answering task is a relatively new research field,which is different from traditional visual tasks such as image classification,target detection and semantic segmentation.Visual question answering takes an image and any form of question about the image as input and outputs a correct answer related to the question.Therefore,visual question answering is a task that combines computer vision,natural language processing and multimodal feature fusion.In the medical field,the "second opinion" provided by some automated auxiliary systems can enhance clinicians’ confidence in interpreting complex medical images.Therefore,as a new type of digital intelligent medical equipment,visual Q & a technology in the medical field has great market space and potential.This thesis studies the visual question answering system on medical images,and makes in-depth research,improvement and innovation on the core components of the visual question answering system: image feature extraction and text feature extraction,multimodal feature fusion,answer prediction and so on.The research contents of this thesis are as follows:Firstly,this thesis proposes VGBM model(biobert medical visual question answering model based on biomedical corpus).The main idea of VGBM model is to extract image features by using part of the middle layer output of pre trained vgg16 Network + global average pool,and text features are extracted by biobert model pre trained on medical text corpus,Then the two extracted features are fused by collaborative attention mechanism(MFH),and finally a classification layer is input to predict the answer.Secondly,the VGBM model is further improved by using the method based on sentence structure and image attention mechanism,the channel attention mechanism is introduced in the extraction of image features,and the sentence structure mapping method is used in text processing,which further optimizes the model and further improves the performance of the model.Finally,the proposed method is tested on three medical visual question and answer data sets(Image CLEF 2019 VQA Med,Image CLEF 2020 VQA Med and VQA-RAD).The experimental results show that the model in this thesis has achieved good experimental results on the three data sets. |