Multi-Level Context Modeling For Video Content Analysis

Posted on:2022-11-28

Degree:Master

Type:Thesis

Country:China

Candidate:X R Li

Full Text:PDF

GTID:2558307154979349

Subject:Engineering

Abstract/Summary:

With the development of deep learning,video understanding tasks have become complex and diverse,and context modeling has become the focus of research on video content analysis.Current video context modeling methods mainly use various variant structures of recurrent neural networks to analyze video content.However,there will be problems of incomplete and in-depth contextual information mining.Based on video anomaly detection and video question answering tasks,this paper studies the problem of video context modeling.First,in the video anomaly detection task,the distribution of abnormal events in surveillance videos is uneven and the definition of abnormal events heavily depends on the context.Aiming at the problem that existing models are difficult to identify complex abnormal behavior information in video sequences,this paper proposes a method based on graph convolution multi-level context modeling.In the aggregation feature stage of the graph convolution,the non-local similar features of the node pair and the temporal local features are combined to obtain multi-level context features.In the feature extraction stage of the graph network,instance normalize for non-local attention module selects the necessary information to solve the problem of missed detection of abnormal events in the long video.Experiments on two larger benchmark datasets verify the effectiveness of the algorithm.Second,in the video question answering task,the video moves frequently in a small space-time range.It is difficult to express rich and complete video context features for existing models.This paper proposes a dual-branch structure complementary multilevel context modeling method.The first branch uses Transformer with relative position representation to construct connections at different moments in a video sequence,learns multi-scale video features,and enhances the expression of non-local features.The second branch is the proposed de-redundancy module,which drives the subsequent video clips to learn different information,enhances the expression of local features,and solves the problem of the expression of complex video semantic features in short videos.Experiments on three benchmark datasets verify the effectiveness of the algorithm.

Keywords/Search Tags:

Video Context, Multi-Level, Graph Network, Complementary, Video Anomaly Detection, Video Question Answering

Related items

1	Research On Affective Visual Question Answering
2	Research And Implementation Of Video Question Answering With Multimodal Data
3	Object-oriented Two-Stream Network And Heterogeneous Graph Reasoning On Video Question Answering
4	A Large-scale Video Question Answering Benchmark For Fine-grained Compositional Reasoning
5	Multi-Grained Hierarchical Attentional Recurrent Network For Video Question Answering
6	Video Question Answering Based On Deep Memory Fusion Method
7	Research On Video-grounded Multi-turn Dialogues Of Question Answering
8	A Research Of Video Question Answering Based On Deep Learning
9	Research On Abnormal Event Detection Method Based On Video Context Semantics
10	Video Question Answering Based On Attention Mechanism And Graph Convolutional Network