The Method Of Human-object Interaction Action Recognition

Posted on:2022-07-03

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Yan

Full Text:PDF

GTID:2518306512971989

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

With the continuous development of deep learning and artificial intelligence technology,human action recognition technology has received more and more attention.It is widely used in human-computer interaction,unmanned store,security monitoring,patient care,virtual reality and other fields.The goal of action recognition is to understand and analyze the human action from the video image sequence in the scene,so it is particularly important to be accurate and efficient.Because the depth sensor can effectively avoid the influence of illumination,occlusion,environmental changes and other factors,the action recognition method based on skeleton data has become a hot research direction in the field of pattern recognition.In recent years of research,graph convolutional network(GCN)that models the human skeleton as a spatiotemporal graph has achieved excellent performance,but there are still problems in the existing methods.and the research on human object interaction in video has not yet made a breakthrough.For the actions of human-object interaction in the scene,the probability of misclassification is high,and the low recognition accuracy affects the overall performance of the algorithm.The identification of human-object interaction behaviors needs to be resolved and has important research value.Therefore,this article focuses on the deficiencies of the above methods and the main research work is as follows:(1)This paper aims to solve the problem of human object interaction action recognition,firstly,an interaction detection network is constructed to determine whether there is interaction in the action.Due to the lack of available information other than the human body in the existing data set,this article uses the labelimg tagging tool and the Siam RPN algorithm to accurately obtain the spatial position of the person and the object in the action data set for the interactive action category,which is used for the construction of the interactive detection network.Model and the development of follow-up work.Then the relative relationship between people and objects is coded for feature representation,which is used for location feature extraction after network modeling.Finally,it is judged whether there is interaction in the action through the learning of the network parameters and the temporal and spatial position characteristics of human and objects,which facilitates the subsequent study of the identification of human and object interaction behaviors.This method is tested on the existing data set NTU RGB+D 60,and the average accuracy of the interaction judgment is 75%.(2)In order to solve the problem of misclassification of the screened interactive action categories,based on the research of GCN human action recognition,the text considers the complementation of multi-feature information for the interaction of human and objects,and proposes a multi-modal deep fusion Recognition method of interaction between people and things.The connection between multiple features is difficult to explore,and there are interactive actions between people and objects in the video.At this time,the RGB information of the interactive objects in the scene and the spatio-temporal relationship between people and objects are used to effectively supplement the information,so as to achieve action classification.In the stage of constructing the network,the RGB information in the action scene is selected for preprocessing.The deep network can extract effective pixel contour features of people and objects,and this information can be used as a supplement to complete the final action recognition.This paper also considers the spatio-temporal changes of people and objects in the interaction process,by changing the deep network structure and feature coding methods,and adding people and objects spatiotemporal feature information modeling to further effectively supplement the action information in the scene.Through the model optimization,the potential complementary relationship between features is used,and the multimodal deep fusion strategy is used for model fusion to improve the effect of interactive action classification.Experiments and analysis are carried out on NTU RGB+D 60,which is a large-scale skeletal action data set.Compared with the existing algorithms,the recognition accuracy of the multimodal human-object interaction identification fusion method proposed in this paper was improved,which proved the effectiveness of the proposed method.

Keywords/Search Tags:

Action recognition, Graph convolutional networks, Information supplement, Human-object interaction, Multimodal depth fusion

PDF Full Text Request

Related items

1	Research On Human Interactive Action Recognition Based On Skeletal Key Points
2	Fusion Of Skeletal And STIP-based Features For Action Recognition And Its Application
3	Research On Human Skeleton Action Recognition Based On Graph Convolutional Networks
4	Research On Multimodal Human Action Recognition
5	Key Technologies Research For Activity Understanding And Visual Interactions
6	Human Interaction Detection Based On Multimodal Fusion
7	Research On Human Action Recognition Based On Graph Convolutional Neural Networks
8	Research On Human Action Recognition Based On Graph Convolution Network And Target Detection
9	Human Skeletal Action Recognition Via Graph Convolutional Networks
10	Research On Human Action Recognition Algorithm Based On Depth And Skeleton Information