| Human action recognition is to input a video or image sequence to the computer,and classify the behaviors of the human body in the video according to the extracted video features.As the core technology of video understanding,human action recognition has broad application prospects and important social value in actual scenes such as scene monitoring and information retrieval.With the development of deep learning,action recognition methods based on deep learning have received extensive attention from scholars at home and abroad.At present,two-stream convolutional neural network is one of the important methods of human action recognition research based on deep learning.The two-stream convolutional neural network is divided into two components: spatial stream and temporal stream.The action movement information extracted by the temporal stream is used as a supplement to the action appearance information extracted by the spatial stream,which effectively improves the accuracy of action recognition.However,both the spatial stream and the temporal stream in the two-stream convolutional neural network use a single-size convolution kernel,which cannot capture objects of different scales in a complex video environment,and misjudges some confusing action.This article studies this issue,and the main tasks completed are as follows:1.Aiming at the problem that the two-stream convolutional neural network cannot extract the multi-scale features in the video,a human action recognition algorithm based on the multi-scale two-stream convolutional neural network is proposed.By applying the pyramidal convolution to the spatial stream and the temporal stream respectively,the ability of a single stream to recognize objects of different scales is enhanced.Pyramid convolution uses convolution kernels of different size to capture multi-scale information,realizes multi-scale recognition,and improves the accuracy of action recognition.2.Aiming at the problem that neurons cannot adaptively select the size of the convolution kernel,which leads to the low recognition rate of objects at different scales,a multi-scale two-stream neural network recognition algorithm fused with attention mechanism is proposed.The algorithm uses the attention mechanism between channels to fuse the feature maps output by convolution kernels of different sizes,and further fuse information of different scales.At the same time,for the method of obtaining attention weights by reducing the dimensionality of the channel feature vector,it will bring information loss.It is proposed to avoid the dimensionality reduction of the channel feature vector through one-dimensional convolution,so that the obtained attention weight is more accurate.According to the number of branches of the convolution kernel,the method of generating the feature vectors of different convolution kernels to learn the attention weights through multiple one-dimensional convolutions improves the independence of the convolution kernels compared to the method of sharing the same feature vector to learn the attention weights.It reduces the coupling and further improves the accuracy of recognition.This paper verifies the effectiveness of this method by conducting experiments on the UCF-101 data set and analyzing the results. |