Font Size: a A A

Human Action Recognition In Video Based On Deep Learning

Posted on:2021-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:W ChangFull Text:PDF
GTID:2428330611956202Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Action recognition of human in video is a highly active research in the field of computer vision.With the deepening and intricacy of the convolutional neural network,as well as the huge improvement of computer hardware performance.The technology of action recognition has been developed rapidly and the accuracy rate has become higher and higher.In many application scenarios,such as video surveillance,smart home,urban security and other fields,action recognition method plays a vital role.The accuracy of action recognition in video is affected by multiple factors such as lighting conditions,various perspectives,composite backgrounds,and huge intra-class changes.It is generally believed that optical flow information represents the motion information of video without background information.Therefore,it can complement the image information,improving the effect of the two-stream model.But,the cost of calculating optical flow information is high.Usually ten frames before and after a frame are used as optical flow information,which may be too long or too short to capture the useful motion features for different actions.Moreover,the motion features extracted by the optical flow information is at a low level with noise,and it is not clear what useful functions optical streaming provides for action recognition.To solve these issues,we propose a novel two-stream framework to recognize actions based on a high-level motion feature,pose estimation,in place of the optical flow.We observe that many actions have specific characters of pose.Therefore,our two-stream framework uses the 2D human pose estimation as the motion feature to remove the redundant background information and highlight the description of the human motion.By getting only human pose estimation corresponding to the image frames,our approach can effectively reduce the amount of calculation.Moreover,to effectively fuse the features from the twostream framework,we add the attention mechanism in the fusion layer,which not only highlights the local features,but also allow the network to give distinct attention between original image frames and pose estimation.We conduct extensive experiments on two challenging human action datasets: HMDB-51 and UCF-101.The experimental results illustrate that our model outperforms the state-of-the-art methods in terms of accuracy,especially in the sports scenes that can clearly distinguish the human pose estimation.In order to make the algorithm studied in this paper applicable to real life scenarios,the article introduces an action recognition system at the end.In order to allow the system to be used on multiple platforms,we save the trained model and deploy it on the We Chat Mini Program platform,and integrate the system on this platform using the current popular frontend and back-end technologies.
Keywords/Search Tags:Convolutional Neural Network, Two-Stream Network, Densely Connected Convolutional Networks, Human Pose Estimation
PDF Full Text Request
Related items