Font Size: a A A

Research On Action Recognition Based On Skeleton Data

Posted on:2024-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:2568307157984619Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of the action recognition,many supervised action recognition techniques based on RGB video have achieved remarkable results,but there are still some challenges.In contrast,the self-supervised learning algorithm can not only ensure low-cost access to a large amount of training data without using labeled data,but also learn more valuable information from unlabeled data.This paper studies self-supervised human skeleton action recognition.The main research contents are as follows:(1)Human skeleton for action recognition based on cross-scale graph contrastive learningTraditional self-supervised learning models based on the human skeleton usually use contrastive learning modules for representation learning,while existing contrastive learning modules use data augmentation methods to construct similar positive samples,and the rest of the samples are all negative samples,which limits the expression of semantic information for similar samples.To solve these issues,an action recognition algorithm with graph contrastive learning and cross-scale consistent knowledge mining is proposed.First,a new data augmentation method is designed based on the skeleton graph structure,which performs random edge cuttings on the input skeleton sequence to obtain two different views,thus enhancing semantic correlation expression between different views of the same skeleton sequence.Second,to alleviate the problem of low embedding similarity to similar samples,a self-supervised co-training network model is introduced to obtain positive class samples from one skeleton scale and another skeleton scale by using complementary information between different scales of the same skeleton data source,to realize the association within a single scale and semantic collaborative interaction between multi-scales.Finally,the effectiveness of the model is evaluated based on the linear evaluation protocol,and the experimental results on NTU RGB+D 60 and NTU RGB+D120 datasets show that the recognition accuracy of the proposed method is improved by2%~3.5% on average compared with the cutting-edge mainstream methods.(2)Cross-view nearest neighbor contrastive learning of human skeleton representationTraditional self-supervised contrastive learning approaches regard different views of the same skeleton sequence as a positive pair for the contrastive loss.While existing methods exploit cross-modal retrieval algorithm of the same skeleton sequence to select positives.The common idea in these works is the following: ignore using other views after data augmentation to obtain more positives.Therefore,we propose a novel and generic Cross-View Nearest Neighbor Contrastive Learning framework for self-supervised action Representation(Cros NNCLR)at the view-level,which can be flexibly integrated into contrastive learning networks in a plug-and-play manner.Cros NNCLR utilizes different views of skeleton augmentation to obtain the nearest neighbors from features in latent space and consider them as positives embeddings.Extensive experiments on NTU RGB+D60/120 and PKU-MMD datasets have shown that our Cros NNCLR can outperform previous state-of-the-art methods.Specifically,when equipped with Cros NNCLR,the performance of Skeleton CLR and Aim CLR is improved by 0.4%~12.3% and 0.3%~1.9%,respectively.
Keywords/Search Tags:self-supervised learning, human skeleton action representation, cross-scale graph contrastive learning, cross-view nearest neighbor contrastive learning, plug-and-play
PDF Full Text Request
Related items