Font Size: a A A

Research On Human Action Recognition Method Based On Consistency Constraint And Shuffle Invariance

Posted on:2022-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q H Y ShiFull Text:PDF
GTID:2568306728456564Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Human action recognition is extremely important in computer vision,and it plays an important role in human-computer interaction,intelligent video monitoring,rehabilitation medicine and other fields.With the development of deep learning,many deep learning-based methods have been applied in the field of human action recognition.However,the scene is complex in real world.How to effectively process the data and extract the discriminative features is still a problem,which has not been completely solved in the field of human action recognition.We mainly study how to enhance the integrality and discriminability of features.The details are as follows:1)This paper proposes a new neural network learning method to improve the performance for action recognition in video.Most human action recognition methods use a clip-level training strategy,which divides the video into multiple clips and trains the feature learning network by minimizing the loss function of clip classification.The video category is predicted by the voting of clips from the same video.In order to obtain more effective action features,a new video-level feature learning method is proposed to train 3D CNN to boost the action recognition performance.Different with clip-level training which uses clips as input,video-level learning network uses the entire video as the input.Consistent constraint loss is defined to minimize the distance between clips of the same video in voting space.Further,a video-level loss function is defined to compute the video classification error.2)The local key features in video are important for improving the accuracy of human action recognition.However,most end-to-end methods focus on global feature learning from videos,while few works consider the enhancement of the local information in a feature.In this paper,we discuss how to automatically enhance the ability to discriminate the local information in an action feature and improve the accuracy of action recognition.To address these problems,we assume that the critical level of each region for the action recognition task is different and will not change with the region location shuffle.We therefore propose a novel action recognition method called the shuffle-invariant network.In the proposed method,the shuffled video is generated by regular region cutting and random confusion to enhance the input data.The proposed network adopts the multitask framework,which includes one feature backbone network and three task branches: local critical feature shuffle-invariant learning,adversarial learning and an action classification network.To enhance the local features,the feature response of each region is predicted by a local critical feature learning network.To train this network,an 1based critical feature shuffle-invariant loss is defined to ensure that the ordered feature response list of these regions remains unchanged after region location shuffling.Then,the adversarial learning is applied to eliminate the noise caused by the region shuffle.Finally,the action classification network combines these two tasks to jointly guide the training of the feature backbone network and obtain more effective action features.In the testing phase,only the action classification network is applied to identify the action category of the input video.We verify the proposed method on HMDB51 and UCF101 datasets.The experimental results show that our approaches effectively improve the accuracy of human action recognition.
Keywords/Search Tags:Human action recognition, Video-level training, Key region, Shuffle invariance
PDF Full Text Request
Related items