Font Size: a A A

Research On Video Captioning Based On Deep Learning And Its Application In Coal Mine

Posted on:2022-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y MaFull Text:PDF
GTID:2481306554450474Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Video captioning is a challenging task.It covers two aspects of computer vision and natural language processing.Its main goal is to convert visual content into accurate and concise text descriptions.Video captioning has broad application prospects in many fields,especially in the coal mine field,which has attracted more and more people's attention.The application of video captioning technology to coal mines reduces the difficulty and time of retrieving coal mine videos.The research of underground monitoring video intelligence is of great significance.Due to the big difference between the visual features and high-level semantics at the bottom of the video,this article combines video feature extraction and visual text detection to improve the video text description method based on deep learning.The main research contents are as follows:(1)In the past encoder-decoder learning,the length of all video features are encoded as fixed.As the length of the input video features continues to increase,the effect of video text description is getting worse and worse,and the introduction of attention mechanism can improve The performance of the model on the encoder-decoder task enables the machine to give the model higher weight in key areas of the video when processing the video.For this reason this paper proposes a 3D residual network based on attention The video captioning model.First,in the encoding stage,the attention mechanism is introduced into the 3D residual module,and the video feature mapping is enhanced through one-dimensional channel attention and two-dimensional spatial attention to reduce the influence of irrelevant targets and noise;secondly,the Glove model is used to describe the text vectorization operation,In order to enhance the relevance of words and words;finally in the decoding stage,using the sequential characteristics of the double-layer LSTM deep network,output a text description that expresses the high-level semantics of the video.This paper conducts experiments on two public data sets.The experimental results show that the model can more accurately describe the high-level semantic information of videos using natural language.(2)Aiming at the problem that most video captioning algorithms do not fully describe the details of the target in the video,and it is easy to ignore the potential text features of the video,a video text description method based on visual text and residual connection is proposed.First,use the BERT model to detect the visual text in the video;secondly,merge these visual texts with the output of the first-layer GRU network and input them into the second-layer GRU network;finally,in order to obtain a closer description of the video and text The mapping relationship between each layer of GRU builds a residual connection structure.The experimental results show that the model can describe the detailed information in the video,which greatly optimizes the quality of the video captioning.(3)Apply the video captioning algorithm proposed in this paper to coal mine scenes.Firstly,the coal mine underground monitoring video is preprocessed,and the coal mine description data set is produced.The coal mine data set is used to train the model in this paper.Secondly,the underground coal mine monitoring video often contains the time and place of the event.In order to make the video description more specific,the video The extracted subtitles are introduced into the text description generated by the LSTM language model to generate the text description of the coal mine surveillance video.Finally,it can be seen from the experimental results that the model proposed in this paper has good results in the coal mine description data set.
Keywords/Search Tags:Video captioning, Deep learning, Attention mechanism, Coal mine scene, BERT model
PDF Full Text Request
Related items