Research On Multimodal Pre-Training Technology And Visualization Based On Lightweight Model

Posted on:2023-05-31

Degree:Master

Type:Thesis

Country:China

Candidate:T T Liu

Full Text:PDF

GTID:2568306914971689

Subject:Intelligent Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of deep learning and high-performance computing resources,the pre-training models based on attention mechanism have made excellent achievements in the fields of natural language processing and multimodality.However,the current pre-training models need a lot of training data,and the scale of the model is huge,which leads to its high training cost and difficult to deploy on low resource equipment.Therefore,this paper studies the multimodal pre-training model using lightweight model and a small number of data sets.The specific work is as follows:Based on the idea of curriculum learning,a new multi-stage pretraining method is proposed.Imitate the process of human learning and gradually increase the difficulty of tasks from simple to complex in stages,so as to make better use of different types of data and improve the performance of learning.In this paper,we propose a new Multi-stage Pretraining(MSP)method,which uses information at different granularities from word,phrase to sentence in both texts and images to pre-train the model in stages.At the same time,this paper designs some new pretraining tasks suitable for the information granularity of each stage for the pre-training of each stage,so as to fully capture all kinds of knowledge in the limited corpus.For example,in order to make the model fully learn the corresponding relationship between image and text,this paper designs image features random shuffle(IFRS)to make the model restore the original order of image according to the order of text end.The experimental results on multiple data sets including visual question answering,image text retrieval and other different downstream tasks show that the accuracy of this model in all downstream tasks is comparable to that of the original large model,and even exceeds that of the large model in some data sets.This paper further studies the visualization of the proposed multimodal pre-training model,and obtains some explanatory conclusions on the working principle of the model.Including:pre-training based on word granularity helps the model to realize image text alignment,pretraining based on phrase granularity helps the model to learn the attribute information of objects,etc.On this basis,a visual tool for attention distribution is constructed to visualize the internal attention distribution of the model,explore the attention distribution between single modes and multi modes,and explore how the pre-training model learns the knowledge in the language and how to use this knowledge to solve the downstream tasks.

Keywords/Search Tags:

multimodal pre-training, multi-stage pre-training, pre-training task

PDF Full Text Request

Related items

1	Research On Semantic Understanding And Generation Method Of Multimodal Task-Oriented Dialog Based On Pre-training And Fine-tuning
2	Design And Implementation Of Training Management System
3	Research On The Design Of Human Resource Training And Exploration System For Power Plant A
4	Research On YTE Company Training Management System Improvement
5	Research On Communication Application Training Project Based On Tri-networks Integration
6	Research On Recommendation Algorithm Based On Multimodal Information
7	Army Training Program Automatic Generation System
8	Research On The Optimization Of Training System In Zhengzhou Science And Technology Park Of F Group
9	A Study Of Self-training Methods For Machine Reading Comprehension Span Extraction Tasks
10	The Design And Implementation Of Highway Engineering Construction Of Teacher Continuing Education Training Management System