Automatic Auido Captioning Based On Reinforcement Learning

Posted on:2023-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:G Y Chen

Full Text:PDF

GTID:2568306836972449

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Automated audio captioning is a cross-modal text generation task.Automated audio captioning aims to use natural language to describe the content of the input audio data.Compared with the traditional tasks such as audio classification,Automated audio captioning is more complex,but it also has a broader application prospect,such as providing convenience services for the disabled.Existing works mainly focus on investigating new methods and try to improve their performance measured on existing datasets.Due to the lack of cross-modal pre-trained resource,few try to improve the system with pre-trained model.At present,there are some problems in the field of audio captioning,such as too few datasets available and poor captions generated by decoder.In order to solve the above problems,this paper presents an audio captioning system with an encoder-decoder architecture,where the decoder predicts words based on audio features extracted by the encoder.After that,we use different single modal pre-trained resource to improve the system.The specific work contents are as follows:(1)In order to improve the performance of the multi-modal system with single modal pre-trained resource,this paper attempts to take advantage of two audio modal resource.Firstly,we introduce an audio pre-trained resource: the pre-trained model PANNs which comes from audio classification task into our audio captioning system and use it to initialize the parameters of our encoder.Then,we use the Audio Caps dataset to pre-train the overall system,so that our encoder can extract the audio feature information that is better for the decoder to generate caption sentences.The experimental results of training on Clotho dataset show that the single audio modal pre-trained resources we use are effective and can significantly improve the performance of the multi-modal system of audio annotation.(2)In addition to the single audio modal pre-trained resource,this paper also explores the application of single text modal pre-trained method to the audio captioning system.We use a pretrained method comes from reinforcement learning which aims at text evaluation metrics to improve the performance of the system,so that the model can generate better captions.Finally,we combine the pre-trained methods of audio modal and text modal.The experimental results of training on Clotho dataset show that the pre trained methods of audio modal and text modal we use can greatly improve the final performance of the cross-modal system of automated audio captioning.

Keywords/Search Tags:

Automated audio captioning, Reinforcement learning, Transfer learning, Deep learning, multi-modal task

PDF Full Text Request

Related items

1	Application Of Multi-Task Based Audio Feature Extraction In Audio Captioning System
2	Research On Multi-task Learning Based Image Captioning Algorithm
3	Image Captioning Based On Deep Learning And Multi-Metric Reinforcement Learning
4	Research On Key Techniques Of Differentially Private Transfer Learning
5	AGV Task Scheduling Method Based On Transfer Reinforcement Learnin
6	Research On Deep Learning-Based Representation Learning Algorithms
7	Research On Image Captioning Algorithms Based On Deep Learning
8	Deep Multimodal Attention Learning For Image Captioning
9	Source Task Selection For Transfer Learning In Reinforcement Learning
10	Researches On Short Video Captioning Based On Deep Learning