Font Size: a A A

Research On Video Moment Retrieval Based On Graph Convolutional Networks

Posted on:2022-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:M F WangFull Text:PDF
GTID:2558307154476024Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of network and multimedia technology,video information is increasingly rich and widely used in various scenes.Facing the diversity and complexity of video information,in order to effectively retrieve interesting video clips from the complete video,Video Moment Retrieval task is proposed,which could be divided into text query based and video query based tasks according to query media.In Video Query based Video Moment Retrieval task,the corresponding relationship between the internal action details of the query video clip and the reference video is complex,and it is necessary to establish a complex inter-frame feature interaction and fusion process.In order to deal with this situation better and further improve the retrieval result,we introduce graph structure and graph convolution into Video Query based Video Moment Retrieval task.Using the characteristics that graph structure can effectively express complex relationships,the reference video and query video are modeled as one graph structure.Then,we use graph convolution to realize the complex inter-frame feature interaction and fusion process,which could achieve good result.Based on the work mentioned above,we also improve the graph structure and achieve better result.Our works are listed as follows:(1)In order to better realize the complex inter-frame feature interaction and fusion,we propose Multi-Graph Feature Fusion Network.First,we use Temporal Actionness Grouping method to extract proposal video clip from reference video,and we build triplets with query video clip and proposal video clip,extract video feature from the clips.Then,we model features from query video clip and proposal video clip together as a graph structure,whose nodes are video features from each timestep.Node connections are also built on the graph structure.Finally,we use Multi-Graph Feature Fusion Block to furtherly extract and fuse video features from different timesteps in the graph.(2)Based on the Multi-Graph Feature Fusion Network,we propose to modify the fixed graph into an input-related graph,which could be generated by trainable parameters,so that the graph could have learning ability.Specifically,we first use a fully connected layer to transform video feature dimensions,then we use cosine similarity to measure the feature similarity between video feature sequence at different time points,as each element of the adjacency matrix.An L1 sparse term is used to constrain the adjacency matrix.Referring to other Video Query based Video Moment Retrieval methods,we modify Video Action Localization datasets Activity Net v1.2 and Thumos14 to make them adapt to the experiment.Experimental results show that the performance of our proposed method is improved than other methods.
Keywords/Search Tags:Video Moment Retrieval, Deep Learning, Multi-Graph Feature Fusion, Graph Convolution Networks
PDF Full Text Request
Related items