Font Size: a A A

Research On Video Behavior Recognition Method Based On Deep Learning

Posted on:2021-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:L GuoFull Text:PDF
GTID:2428330620963433Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of smart city construction,people have a higher demand in the field of security,and the recognition of behavior in video has become an important research direction.In traditional video behavior recognition methods,features are designed manually,but in the video behavior recognition methods based on deep learning,useful features are learned automatically by computer.In addition to the field of intelligent security,video behavior recognition technology is also widely used in human-computer interactive games,medical care,behavior re-identification and other fields.In most current methods of video behavior recognition based on deep learning,the features extracted from the network are treated equally,and the features that play an important role in the recognition results are not paid attention to in the recognition process.In this paper,two attention mechanisms in the field of computer vision are introduced to build two video behavior recognition networks.A video behavior recognition network based on the squeeze and excitation mechanism is constructed.The temporal segment network is set as the basic framework,and the residual network based on the squeeze and excitation mechanism is designed as the basic network of temporal and spatial network.Through squeeze and excitation operations,the features extracted from the network are weighted in the channel dimension,and the features are given different weights to improve the recognition accuracy.The temporal segment network firstly divides video into three segments,extracting stacked optical stream and RGB video frame from each segment as input of temporal and spatial respectively,and makes a preliminary prediction of the video behavior.Then,the temporal and spatial network predictions of each segment are fused to obtain video-level temporal and spatial network predictions.Finally,the video-level prediction results of the temporal and spatial network are fused to obtain the final classification result.In the training of video behavior recognition network based on squeeze and excitation mechanism,firstly,spatial network based on squeeze and excitation mechanism is trained on large-scale data set Imagenet,then temporal network based on squeeze and excitation mechanism is trained by cross training strategy,finally,the trained temporal and spatial network parameters are taken as initial values to train the spatial and temporal network with the squeeze and excitation mechanism to obtain the final recognition result.Experiments were performed on UCF101 and HDB51 datasets,and the results show that the accuracy is improved.A video behavior recognition network based on the convolution block attention module is constructed.The temporal segment network is set as the basic framework,in order to better conform to human's recognition and understanding of video behavior,the different network structures are adopted for the temporal and spatial network in temporal segment networks.The BN-Google Net based on convolution block attention model is designed as the basic temporal network,and the residual network based on convolution block attention model is set as the basic spatial network.Through the convolution block attention module,the features extracted by the network are weighted in the channel and spatial dimensions,and the features are given different weights to improve the recognition accuracy.The pre training strategy of network is the same as that of video behavior recognition network based on the squeeze and excitation mechanism.Experiments are performed on UCF101 and HMDB51 datasets,and the results show that the accuracy is improved.
Keywords/Search Tags:video behavior recognition, squeeze and excitation mechanism, convolution block attention module, temporal segment network, two stream convolution network
PDF Full Text Request
Related items