Multimodal Pretraining Model With Weak Supervision And Momentum Distillation

Posted on:2024-08-13

Degree:Master

Type:Thesis

Country:China

Candidate:H H Zhang

Full Text:PDF

GTID:2568307118977849

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Multimodal learning technology can enhance the application value of machine learning in real-life scenarios,enable models to better adapt to and handle various complex tasks and situations,by processing different modal of data.For image-text data,most existing multimodal pretraining models use external object detectors to extract image region features on large-scale datasets,and then use these regions to pretrain with corresponding text.However,existing multimodal learning methods doesn’t handle those scenarios with insufficient training data,no bounding box annotations,and noise.To address these issues,this thesis studies the multimodal pretraining model that combines weak supervision and momentum distillation,consisting of the following two parts:(1)To address the problem of insufficient training data and the lack of bounding box annotations,a Transformer-based multimodal weak supervision pretraining model is proposed.First,a weakly supervised object localization method is introduced to obtain region features of images.Then,a Transformer-based image-text encoder framework is used to represent the multimodal features of medical images and diagnostic reports.By performing pretraining tasks such as image-text contrastive learning,image-text matching,and masked language model,the alignment of image and text is achieved.Finally,the proposed multimodal model is decoupled and applied to medical image classification tasks on various datasets,experimental results show its effectiveness.(2)To address the problem of noisy datasets,a multimodal pretraining model based on momentum distillation is proposed.First,a weakly supervised object localization method is used to obtain bounding boxes of target objects,which are then used to extract region image features.Next,momentum distillation is introduced to create a teacher model with the same structure as the student model.Region image-text pairs are input into the teacher model to generate pseudo-targets,which are used as additional supervision for the student model’s pretraining tasks.Finally,the proposed algorithm is applied to multiple medical image datasets,experimental results show its effectiveness.This thesis has 25 figures,12 tables and 89 references.

Keywords/Search Tags:

multimodal learning, pretraining model, weak supervision, momentum distillation

PDF Full Text Request

Related items

1	Multimodal Sentiment Analysis Based On Deep Learning
2	Research On Image Segmentation Algorithm Based On Weak Supervision
3	Adversarial Training For Universal Multimodal Learning
4	Anomaly Detection With Weak Supervision In Surveillance Video Scene
5	Research On Compression Method Of Multimodal Pretrained Model Based On Knowledge Distillation
6	Research On Temporal Behavior Detection Method Based On Weakly Supervised Learning And System Implementatio
7	Research And Industrial Application Of Fine-grained Image Recognition Based On Deep Learning Under Weak Supervision
8	Group Activity Recognition Research Based On Graph Model And Deep Learning
9	Research On Question Classification Based On Weak Supervision And Deep Learning
10	Research On Image Salient Object Detection Based On Weak Supervision