Font Size: a A A

Research On Video Action Recognition Based On Deep Neural Networks

Posted on:2020-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z S XuFull Text:PDF
GTID:2428330590484482Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Video action recognition is a challenging research topic in the field of computer vision,which aims to analyze and classify actions of human beings in video.Video action recognition has a wide range of applications in the fields of video retrieval,intelligent medical monitoring,human-computer interaction and traffic security.However,due to the influence of interference factors such as camera disturbance,background clutter and viewpoint changes,recognizing human behavior accurately from video scenes remains a difficult research topic.In the previous research results,the traditional way to solve the problem of video action recognition is to extract highly engineered features by human experts from the video,and then combine the classifier to classify the feature.In practical applications,it is difficult to extract general and appropriate features because the same behavior is significantly different in background and motion details.On the other hand,those traditional methods can only study and classify video behavior in simple scenes instead of complex video.Since deep neural network has made great breakthrough in the development of natural language processing,video classification,image classification,video action recognition based on deep neural networks has attracted more and more researchers.In general,deep neural networks such as convolutional neural networks are utilized for feature extraction and classification.The study of this paper revolves around methods of video action recognition based on deep learning.First,the relevant theoretical basis of deep learning and common human action recognition methods are introduced in detail.Secondly,this paper deeply studies the two-stream video action recognition model proposed by Simon Yan et al.,and designs three sets of comparative experiments to explore the performance of two-stream action recognition model using different learning rate initial value,learning rate adjustment strategy and CNN models.After in-depth exploration of the model,this paper has carried out a series of improvements for its inadequacies and proposed a video action recognition method combining multi-modal information and second-order pooling mechanism,mainly involving the following four improvements: First,we applied the well-performed deep residual network for the two-stream model to extract enhanced features from the video.Second,we modified the structure of the models and replaced the global average pooling with the second-order pooling to extract second-order features.Third,a new information modality was extracted from optical flow images and fused into our model to improve the ability of capturing the video motion features in the models.Fourth,we adopted the weighted-fusion for our architecture which can perform better than average fusion.Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51 where it is competitive with other mainstream methods.
Keywords/Search Tags:video action recognition, deep learning, deep residual network, multi-input, second-order pooling, weight fusion
PDF Full Text Request
Related items