Font Size: a A A

Action Recognition Based On Spatiotemporal Features Of Human Skeleton

Posted on:2021-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:L HeFull Text:PDF
GTID:2428330614470098Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The analysis,retrieval and recognition of human actions in video sequences is an important task in computer vision.It is a hot research topic in multiple areas such as machine vision,pattern recognition,and artificial intelligence.It can be used widely in many applications,including video surveillance,human-computer interaction,and intelligent robots,virtual reality.RGB Images and videos for human actions not only contain complex moving backgrounds,but also uncertain factors such as changes in the lighting and appearance of the human body,which then limit the performance of such RGB-based action recognition.Compared with RGB images and video,human skeleton sequences can well overcome those limitations.Especially,as the advance of pose estimation make the human skeleton data convenient to obtain,action recognition based on human skeletons has received much attractions in past severl years.The human skeleton sequences contain not only temporal features,but also structural spatial features of the human body.How we can effectively extract discriminative spatiotemporal features from the human skeleton sequences remains to be an unsolved problem.Based on deep learning,this dissertation aims to design an accurate,efficient,and robust method for modeling dynamic skeletons for action recognition.The main contributions and innovations of this article are as follows:1.For the posture information of the human body,a two-stream convolutional neural network is used to separately obtain the spatial and temporal features of the video sequences,and then motion representations are generated by fusing the spatial and temporal features.Finally,the posture features with the skeleton information extracted by the long short-term memory network.2.For the spatial features,the human skeleton is first decomposed into five parts and process them separately.Then each part is modeled by using three-dimensional graph convolution neural network(GCNN).Finally perform feature fusion based on the spatial relationship of the body-parts to represent the human actions.This method for the action representation can effectively capture the complex spatiotemporal patterns for human actions.3.For the temporal features,we first sparsely sample a long skeleton sequence to obtain a set of short-term sequences with the same length,and then send the sampled short-term sequences to a long-term and short-term memory network(LSTM)with the shared weights.The outputs of all the long-term sequence are accordingly then fused to obtain the temporal for the skeleton sequence.The obtained temporal feature and spatial feature are classified and fused to achieve action recognition.The above methods have been evaluated on two public datasets,NTU-RGB + D and Kinetics.The experimental results show that the proposed methods in this dissertation can effectively model the spatiotemporal features and obtain promising performance for action recognition based on spatiotemporal features of human skeletons,when comparing with other methods.
Keywords/Search Tags:GCNN, LSTM, action recognition, skeleton sequence, spatiotemporal features
PDF Full Text Request
Related items