Font Size: a A A

Research On Short Video Description Generation And Optimization Method Based On Object Detection

Posted on:2023-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z YeFull Text:PDF
GTID:2568306914956259Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet and 5G communication technologies,video has become a very convenient way to communicate and have fun in people’s lives.When using computers to process these video data,analysis across different modal information is often observed,such as the video description task is one of the most common ones.Short video service as an independent category in the video field has seen a spurt of development in recent years,and description generation of short videos provides great convenience for analysis and processing of short video data,which can be used to quickly classify,retrieve,personalize recommendation and detect abnormal content through keywords in description sentences,so description generation of short videos has also become a hot research topic in academia currently.However,limited by the current approach to the use of short video description datasets and the evaluation methods for the effect of this short video description generation,existing methods for short video description generation by using deep learning lack some reflection on matching levels between video and text semantics.This thesis proposes a short video description generation and optimization method based on object detection in a targeted way,in view of the objects in short videos.Proposes using the object detection technique to detect the main objects in short videos as the input of text modal,which is fed into the short video description generation network together with the video itself features to improve the effectiveness of the short video description method,and proposes a semantics perceptual loss that can be combined with the results of object detection to filter the semantic optimization targets and help the short video description generation network to generate better description statements.The specific research results are as follows.(1)Short video object detection is implemented.Video key frame sequences are constructed based on the temporal difference method between the frame and the frame,and then the static picture object detection algorithm combined with the statistical method is used to achieve the main object detection in short videos.(2)A method is proposed to improve the effectiveness of short video description using text modal information of video content objects.A crossmodal input short video description generation network is constructed,and whether or not to use the textual modal information of the video content objects as a variable is used in comparison experiments to verify the effect.The experimental results on MSR-VTT dataset show that the introduction of target text modal information improves the BLEU 4 and METEOR indexes by 0.4,the ROUGE-L index by 0.1,and the SPICE index by 0.2,so the model can achieve better output results.(3)Semantic perceptual loss is proposed and used for the optimization of short video description generation model.Inspired by the image generation task,this thesis innovatively uses the depth features of textual information for loss computation and uses it on the short video description generation task,while the model performance is improved by filtering the data with richer semantic information as the optimization target.The BLEU 4,METEOR,ROUGE-L,CIDEr and SPICE metrics of the method in this thesis are 44.1,30.4,64.8,52.0 and 12.1 on the MSR-VTT dataset,improving the semantic richness while maintaining the syntactic level effect.
Keywords/Search Tags:short video description generation, perceptual loss, object detection, cross modal task
PDF Full Text Request
Related items