Font Size: a A A

Spatiotemporal Representation Learning For Skeleton-Based Human Action Recognition

Posted on:2024-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:T Y MaFull Text:PDF
GTID:2568307112460794Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the rapid development of computer computing power and non-contact sensing equipment,skeleton-based human action recognition has many applications in many fields,such as machine-aided diagnosis and treatment equipment,the development of somatosensory games,and intelligent security.The spatial and temporal domains are critical and challenging entry points when modeling action features.Current methods focus on extracting high-quality spatiotemporal representations from sequences to improve robustness.However,spatial view consistency and temporal pertinence research still need to be improved.Specifically,models tend to give different answers when people show the same action or are viewed from different angles.Like human vision systems,when the model provides the same attention to each frame of the sequence,the computational load of the model will increase substantially.Hence,the performance of capturing key action features will also decrease.To solve the above challenges effectively,this paper proposes an action modeling framework based on spatiotemporal features,which integrates two modules: Spatial Human View Reset Module(HVRM)and Temporal Directional Attention Module(DAM).First,the paper constructs a spatial human view resetting module based on convolutional neural networks.Through training,the module learns parameters and automatically resets human skeletons from different viewpoints to the best observation viewpoint.It effectively alleviates the influence of viewpoint changes on the recognition accuracy of the model.Then,the paper constructs a temporally directed attention module based on gated recurrent units.Sequences are automatically weighted in the X,Y,and Z directions according to the importance of each frame’s actions.It enhances the pertinence of time series modeling so that the key frame of action can improve the accuracy of model identification.Next,the paper proposes a method for data preprocessing.This method enhances the samples’ expressiveness and improves the model’s robustness.A pre-trained residual network is used to predict action classes.The HVRM,the DAM,and the residual network constitute an end-to-end deep learning network.Finally,the subject conducts ablation experiments on a large-scale international dataset to verify the effectiveness and correctness of the proposed module.Moreover,the proposed model was tested on four challenging public datasets.The test results were compared with the latest methods to validate the advanced performance of the model further.
Keywords/Search Tags:Human Action Recognition, Spatiotemporal Representation, View Invariance, Temporal Attention, Deep Learning
PDF Full Text Request
Related items