Font Size: a A A

Research On Human Interactive Action Recognition Based On Skeletal Key Points

Posted on:2024-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:F GaoFull Text:PDF
GTID:2568306941489244Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Human interaction action recognition,as a high-level semantic scenario in video understanding,is a cross research topic in the disciplines of action recognition,target detection,and visual relationship understanding.It has a wide application prospect in the fields of autonomous driving,search engine,security monitoring,intelligent sports,meta-universe,etc.In recent years,with the continuous development of deep learning,the task of interaction action analysis based on skeletal key points has achieved certain results,however,there are still some challenges,how to construct interaction relations,how to express interaction features,etc.And the interaction relationship between human and human and objects is the basic unit to form complex interaction scenes,therefore,this thesis selects two subtasks in human interaction action:two-person interaction action recognition and human-object interaction detection,focuses on the use of skeletal keypoints in interaction action,and explores the modeling of graph convolution and attention mechanism on interaction relations.In the study of two-person interaction action recognition,the attentional interactive graph convolution network is proposed for the problems of insufficient representation of spatio-temporal interaction features,inconspicuous representation of interaction relations and weak causality relations in the current methods.In spatial representation,the interaction attention encoding graph convolution module,including dynamic graph convolution unit and static graph convolution unit,is proposed for encoding spatial interaction features and constructing mirror spatial graphs.In temporal representation,the interaction attention mask temporal convolution module is proposed to extract time-domain interaction relations using a multi-head cross-attention mechanism.Experiments conducted on the interaction dataset show that the proposed method has higher accuracy and lower parameters compared with other mainstream methods.In the study of human-object interaction detection,this thesis proposes a fusion network for pose and relationship awareness to address the problems of current end-to-end approaches,such as the randomness of initial predictions,the lack of a priori knowledge of human and object location and category information,and the neglect of the importance of human pose information to the model.First,the single-stage end-to-end target detection framework is split into two stages,and the output of the encoder is explored for structural and common-sense prior knowledge.Secondly,a human-object pair proposal and encoding module is proposed to use the appearance,spatial and labeling information of people and objects to candidate and encode,while the graph attention-relationship graph generation module is combined to mine potential interactions.Finally,multiple a priori information including pose information is fused by the multi-feature fusion module to obtain the final interaction category.The experimental results show that the proposed method has higher accuracy compared with other mainstream networks.
Keywords/Search Tags:graph convolutional networks, interaction action recognition, human-object interaction detection, self-attention mechanism, skeletal keypoints
PDF Full Text Request
Related items