Font Size: a A A

Research On Action Recognition Algorithm Based On Multimodal Data

Posted on:2023-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:H D MiFull Text:PDF
GTID:2568306617956489Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition is a hot research topic in the field of computer vision,and has been widely applied in human-computer interaction,automatic drive,anomaly detection and other fields.In vision-based human action recognition research,there are many data modalities used to analyze human behavior,including RGB,depth images,optical flow,and skeleton data.Different modalities have heterogeneous and complementary information for action recognition.Therefore,action recognition based on multi-modal data has attracted extensive attention.We conduct research on action recognition based on multi-modal data,involving hand-object interaction action recognition,human body action recognition,multi-modal data fusion and human-machine collaboration.The main contents are as follows:(1)In the scene of recognizing hand-object interaction in the first-person perspective,the action recognition algorithms based on RGB data rely too much on scene information,and can not generalize to the case when the specific semantics of the interactive object changes.In response to the above problem,we propose to imitate the human to identify such actions.According to the position information of hands and objects in the temporal-spatial domains,the Transformer based on the self-attention mechanism is used to automatically infer the interaction between the targets.The extracted interactions are used to classify actions.In addition,RGB containing scene information and optical flow containing motion information are fused to improve the ability to deal with the situation with insignificant temporal-spatial changes.Therefore,we propose a three-stream relation reasoning network based on multi-modal data.The interaction relationships of multi-modal data are reasoned separately and fused in the decision layer.Finally,the experiment proves that our algorithm can effectively improve the recognition accuracy.(2)In the research of skeleton-based human action recognition,the mainstream algorithms based on GCN need to manually set the topology of the skeleton in advance,which is complicated.We propose to stack 3D human skeleton sequence into 3D skeleton pointcloud to ignore the topological differences between different skeletons.And we model 3D skeleton pointcloud with an algorithm based on PointNet++.Meanwhile,handcrafted features are designed to make up for the lack of spatiotemporal and motion information of disordered pointcloud.In addition,we construct five fusion strategies to fusion RGB and 3D skeleton pointcloud.These strategies are used to improve the ability to discriminate actions with highly similar motion trajectories.Finally,through experiments on the NTU RGB+D 60 dataset,it is proved that the method in this paper shows better performence.(3)We develop a human-computer collaboration system for practical application with the human-computer interaction mode based on human body actions.First,a multi-view and multimodal data acquisition system is built to collect human action dataset,and we collect the SDUACTIONS dataset for human action recognition.Then,we build an application system for human-robot collaboration task.This system equips with pose estimation,human detection,multi-target tracking and real-time action recognition functions.Finally,the system is linked with a mobile robot to accomplish human-robot collaboration that human can control the robot with human body actions.
Keywords/Search Tags:action recognition, multi-modal fusion, human-computer collaboration, skeleton-based action recognition, pointcloud classification
PDF Full Text Request
Related items