Research And Application Of Image And Language Cross-modal Deep Learning In The Field Of Instrumentation

Posted on:2023-06-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Gao

Full Text:PDF

GTID:2532306911482854

Subject:Measuring and Testing Technology and Instruments

Abstract/Summary:

PDF Full Text Request

With the development of deep learning,various fields of artificial intelligence have been greatly improved,including natural language processing,multi-modal processing,etc.In recent years,great progress has been made in the multi-round dialogue rewriting task,the multi-modal image and text question-answering task,and the cross-modal dialogue task.However,studies on cross-modal visual dialogue question answering tasks are relatively rare,which contribute to the development of artificial intelligence.The deep learning cross-modal visual dialogue question answering task can be divided into two sub-tasks: the multi-round dialogue rewriting task and the multi-modal image and text question-answer task.The text question and answer involves two directions of modal fusion and alignment.This thesis believes that the method of collaborative learning can be used as a means to assist multi-modal tasks,and the introduction of collaborative learning can help complete the task of cross-modal visual dialogue question answering.First of all,for the multi-round dialogue rewriting task,it is easy to produce ambiguity because there are many references and omissions in the multi-round dialogue.If the multimodal model directly inputs the dialogue without rewriting,due to the lack of words in the text,it is impossible to image feature are aligned with text embedding,so complete sentences are rewritten using a multi-turn dialogue rewriting module.Aiming at the existing multi-round dialogue rewriting model that uses historical dialogue to provide word information,this thesis introduces a collaborative learning mechanism.On the basis of multiround dialogue rewriting,contextual visual collaborative information is added.Multimodal graphic question answering is an answer given based on image information.Usually,the answer is contextual collaborative visual information.This answer is integrated into the next round of multi-round dialogue rewriting tasks to form a synergy between visual information and text information,and introduce collaborative learning.A mechanism that enables visual information to be restored in the text to be rewritten,improving the accuracy of rewriting.This thesis proposes a cross-modal collaborative visual dialogue question answering model incorporating contextual visual collaborative information.Incorporating contextual visual synergy information into multi-round dialogue rewriting tasks strengthens the role of visual information in sentences,and derives rewritten sentences containing visual information.The rewritten sentence is converted into text embedding,and the image features are extracted as visual information,and the dual-stream multi-modal processing method is adopted.First,the two parts are independently encoded,and then the two parts of the information are crosslearned cross-modal processing,the cross-modal processing can modal fusion and alignment of visual information and text,and finally obtain cross-modal information and get answers to the questions of image and text.This information will be used as contextual visual collaborative information into the next round of multi-round dialogue rewriting tasks,forming a collaborative learning mechanism.The collaborative learning mechanism can improve the accuracy of multi-round dialogue rewriting,and also improve the effect of multi-modal fusion and alignment of dialogue.Based on this idea,the cross-modal visual dialogue question answering task is theoretically studied,the key formulas are given,and the overall model diagram is given.The model is divided into multi-round dialogue rewriting module,multi-modal fusion alignment module and collaborative learning module.The construction of a cross-modal collaborative visual dialogue question answering model is completed.

Keywords/Search Tags:

Natural language processing, Deep learning, Multi-round dialogue rewriting, Multimodality, Visual question answering, Collaborative learning, Instrument field

PDF Full Text Request

Related items

1	Design And Implementation Of An Intelligent Question Answering System Based On TensorFlow
2	Design And Implementation Of Ship Berthing And Unberthing Knowledge Question Answering System Based On Natural Language Understanding
3	Research On Civil Engineering Question Answering System Based On BERT Multi-Task Joint Learning
4	Research And Implementation Of Question Answering System For Construction Codes Based On Deep Learning
5	Design And Implementation Of A Chinese Natural Language Understanding Subsystem In A Car Question And Answer Field
6	Research On Task-oriented Dialogue Based On Knowledge Fusion And Its Application In Instrument Field
7	Research And Application Of Question Answering System In Automobile Field Based On Knowledge Graph
8	Research On Key Techniques Of Intelligent Question-Answering System Of Electric Water Heater Based On Deep Learning
9	Design And Implementation Of A Power Spectrum Question Answering System
10	Research On Visual Language Model For Behavior Analysis And Its Application In Instrument Field