Research On Self-Supervised Contrastive Learning Video Representation Model

Posted on:2024-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:W Y Liu

Full Text:PDF

GTID:2568307151460264

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the comprehensive development of the Internet,the carrier of information has gradually been replaced by video.Existing supervised learning methods require training network models using tag information,and therefore require large-scale manually annotated datasets.However,the annotation of datasets undoubtedly consumes a large amount of resources and time.Self-supervised learning can solve this problem,in which contrastive learning learns representation ability by distinguishing positive and negative samples.To further improve the representation performance of contrastive learning,this paper conducts the following research in terms of feature temporality,positive and negative samples,and residual space:Firstly,aiming at the problem that complex backgrounds and insufficient temporal features in video data affect the representation effectiveness of self-supervised contrastive learning,a video complementary collaborative contrastive representation learning model is proposed.First,the network model is trained in the original RGB and optical flow space respectively by instance contrastive.Then,the feature timing is increased,and the complementary information between different views from the same data source is used to further mine positive samples for model retraining.This method can improve the accuracy of distinguishing positive and negative samples,thereby improving the video representation performance of the model.Secondly,aiming at the problem that the singularity of pretext tasks in video data and the small number of hard negative samples affect the representation effectiveness of self-supervised contrastive learning,an Pretext-Contrast representation learning model based on hard negative samples is proposed.The model combines representation learning based on pretext task with contrastive learning to further improve the spatio-temporal representation performance of the model.In addition,a feature level fusion method is proposed to expand the negative sample set by combining query samples with negative samples to generate hard negative samples,effectively improving the representation performance of the model.Finally,aiming at the problem that insufficient motion information in input data and lack of temporal coherence in video features affect the representation effectiveness of self-supervised contrastive learning,a residual contrastive representation learning model based on temporal diversity is proposed.This model increases the time contrastive loss to increase the temporal diversity of features.In addition,the residual frame view is introduced into the model,and the spatio-temporal representation performance of the model is further improved by using strong spatial enhancement,which significantly improves the effect in video understanding tasks.

Keywords/Search Tags:

unsupervised learning, self-supervised learning, contrastive learning, transfer learning

PDF Full Text Request

Related items

1	Delving Into Contrastive Learning For Unsupervised Visual Representations
2	Researches On Graph Representation Learning Based On Contrastive Learning
3	Chinese Language Identification Based On Transfer Learning
4	Research On Weakly-supervised Learning Based On Sample Selection Strategy And Contrastive Learning
5	Contrastive Learning Based Person Re--identification
6	Basic Algorithm And Framework Research On Contrastive Learning
7	Research On Deep Unsupervised Visual Domain Adaptation
8	Research Of Unsupervised Domain Adaptation Algorithm Based On Adversarial Learning
9	Dimension Reduction In A Number Of Issues
10	Contrastive Learning Based Unsupervised Person Re-Idendification