Font Size: a A A

Research On Multimodal Human Action Recognition

Posted on:2020-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:J Y HeFull Text:PDF
GTID:2428330575456452Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition is a research hotspot in the field of computer vision and has a wide range of applications,including biometric,intelligent monitoring and human-computer interaction.In vision-based human action recognition,the main input modalities are RGB,depth and skeleton.Each modality can capture a certain kind of information that is likely to be complementary to the others.For example,some modalities capture global information,while the others capture the local details of the action.Intuitively,the fusion of multiple modalities is expected to improve recognition accuracy.In addition,one of the challenges in action recognition is how to properly model and use spatio-temporal information.In order to meet this challenge and make full use of the advantages of different modalities,the human action recognition method of multiple modalities fusion is researched in this thesis and the specific research is as follows.The traditional Depth Motion Maps(DMMs)lose time information and the traditional Fourier Time Pyramid(FTP)does not contain enough spatial information and does not capture enough motion details.To address this problem,this thesis proposes an action recognition algorithm based on DMMs and FTP.The spatio-temporal information in the whole human body movement process is complemented by the good differentiation of motion appearance of DMMs and the advantage of FTP in time modeling.For depth,feature is extracted by Local Binary Patterns(LBP)descriptors.For skeleton,static feature and dynamic feature of vector difference between two joints are used to capture more motion information,and then FTP feature is extracted.Finally,Support Vector Machine(SVM)is used and modalities are fused for human action recognition.The experimental results on public datasets show that this method achieves a higher recognition accuracy than the several existing methods.This thesis also proposes a human action recognition framework based on Convolutional Neural Network(CNN)fusion depth and skeleton.For depth,to address the shortcoming of losing time information of DMMs,Adaptive Multiscale Depth Motion Maps(AM-DMMs)are proposed to capture shape motion cues.In addition,the adaptive time window ensures the robustness of AM-DMMs to the change of action speed.For skeleton,to capture the spatio-temporal during the movement of the skeleton sequence,a concise and effective method is proposed to encode each skeleton sequence into three maps containing spatio-temporal information,namely the Stable Joint Distance Maps(SJDMs).Each map describes the different spatial relationships between joints.Finally,a multi-channel CNN is used to extract distinguishing features from the color-coded AM-DMMs and SJDMs for effective human action recognition.The experimental results on public dataset show that this method achieves a higher recognition accuracy than the several existing methods.
Keywords/Search Tags:human action recognition, multimodal fusion, convolutional neural network
PDF Full Text Request
Related items