Font Size: a A A

Study On Semantics-embedded Deep Hashing For Multi-label Video Retrieval

Posted on:2022-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:L CaoFull Text:PDF
GTID:2568306500950469Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity of portable mobile devices and the maturity of network transmission technology,video data expands massively and large-scale video retrieval is heavily demanded in this big-data era.Deep hashing is currently the most effective technique for the retrieval task because of its low storage and time cost.Existing video-hashing methods almost are developed from image hashing methods.They usually regard the video as a continuous image sequence and approximate the video features by fusing the video frame image features for the retrieval task.However,a very important difference between image and video is timing information having great effect on the performance.There are mainly three reasons for the unsatisfied performance: 1)Some hashing methods ignore the problem that video has temporal information,which is an important feature different from image and leads to an inadequate exploration of video features;2)Most hashing methods give equal weights to all the frames in their learning models,which neglects the fact that the content of a video is often determined by several key frames;3)Different from single label,multi labels has richer semantic information and multi-label video retrieval is more challenging.If we follow the definition of similarity for single label will ignore the similarity ranking for pairwise videos with multiple labels and the influence of category association on similarity,which results in a hard measure and cannot reflect the real distance between two videos.In this paper,a novel semantics-embedded deep hashing for multi-label video retrieval method is proposed to solve these problems.First,a hybrid attention module is integrated into the basic CNN+LSTM hashing network for video feature extraction and hash code learning.The attention module consists of a self-attention block and a relation-attention block,learning weights for different frames.Second,a semantics-embedding soft similarity is defined,which employs a GCN to learn both the instance and semantic associations.Results of experiment and comparisons which conducted on the multi video datasets show that,the proposed method achieves significantly higher performance than the competing ones in the multi-label video retrieval task.
Keywords/Search Tags:video retrieval, deep hashing, multi-label learning, soft similarity, semantic-embedding
PDF Full Text Request
Related items