Font Size: a A A

Human Skeleton Action Recognition Based On Graph Convolutional Neural Network

Posted on:2024-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:T Y WuFull Text:PDF
GTID:2568307136989599Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,behavior recognition based on skeleton data has received increasing attention in the field of behavior recognition,and skeleton data has advantages such as high reliability and good real-time performance.However,the following challenges exist in the task of behavior recognition based on human skeleton: 1)Human skeleton data will introduce noise when acquired by object occlusion,limb folding,etc.In addition,noise will also be introduced during the feature extraction operation,and these noises will eventually interfere with the recognition of behavior categories.2)On the other hand,human skeleton data has a large redundancy in space,and how to extract the(3)The ability to capture large scale dynamic features in time domain modeling is poor.In order to solve these problems,the following solutions are proposed in thesis:(1)To solve the noise problem,thesis proposes a novel spatio-temporal channel feature shrinkage network(STCSN),which dynamically learns feature shrinkage thresholds across channels and across space-time to eliminate irrelevant features and noise in features.STCSN provides significant performance gains while introducing a small number of parameters.By adding STCSN to the graph convolutional network to form the STCSN-GCN graph convolutional network,the network achieves significant performance gains on the NW-UCLA and NTU RGB+D 60 datasets.(2)STCSN shrinks key features while shrinking noise.to address this problem,thesis proposes a novel coupled feature shrinkage and enhancement module(CSEM)based on skeleton behavior recognition based on STCSN.CSEM can perform both feature shrinkage and feature enhancement using the feature shrinkage module by relaxing the non-negative constraint on the threshold.By adding CSEM to two graph convolution-based baseline networks,significant performance improvements are achieved for NTU RGB+D(60 and 120)and NW-UCLA datasets on top of STCSN.(3)A novel channel-by-channel graph spatial abstraction network(GSAN)is proposed to address the problem of redundancy in spatial dimensions of skeleton data.This abstraction network can abstract N skeleton points into N/4 key skeleton points,which makes the network more focused on the information of key skeleton points,so that the interference of many different details under the same action can be excluded,and it is easier to obtain the common features among the same actions.The channel-by-channel graph space abstraction network adopts a combination of channel sharing fusion modeling module and channel-by-channel refinement fusion modeling module.Channel-sharing fusion modeling combines two skeletal points with similar locations and carrying similar information into a new skeletal point.The channel-by-channel refinement fusion modeling uses the characteristics of the input data for learning in each channel to generate a learnable graph space abstraction convolution for feature abstraction,and the skeleton of each channel can be abstracted by its own characteristics,so that the graph space abstraction network is no longer limited to the physical structure of the human skeleton,but can dynamically learn the abstract topological relationships among all skeletal points.In thesis,the channel-by-channel graph space abstraction network is replaced by the graph convolution in the baseline network,and the performance improvement is obtained on all three data sets.(4)In order to improve the ability to capture large scale dynamic features,thesis proposes a large kernel convolution-based time-domain modeling method(LKCT).Time-domain modeling often uses stacked small convolution kernels to obtain dynamic features,but the perceptual field of stacked small convolution kernels is small,and thesis chooses to use large convolution kernels to replace the stack of small convolution kernels.In order to capture dynamic features on different time scales,LKCT uses multi-size convolution kernels for convolution operations under different channels,replacing the convolution kernel size from 1*3 to varying from 1*5 to 1*15,which can obtain dynamic features on different time scales simultaneously.By replacing LKCT with the time-domain convolution of the baseline network,performance gains can be achieved on the NTU RGB+D(60 and 120)and NW-UCLA datasets.A powerful graph convolutional human skeleton behavior recognition network is formed by using a channel-by-channel graph space abstraction network,multi-size large kernel convolution for spatial and time-domain modeling of human skeleton data,respectively,and placing coupled feature shrinkage and enhancement modules for noise removal and feature enhancement after spatio-temporal modeling.The network outperforms the baseline network on all three datasets and reaches the current leading level on the NTU RGB+D120 and NW-UCLA datasets.
Keywords/Search Tags:human action recognition, skeleton data, GCN, depth contraction network, feature enhancement, graph abstraction network, large kernel convolution
PDF Full Text Request
Related items