| In recent years,with the development of science and technology,the behavior recognition technology is becoming more and more mature.It is difficult to capture the complementary information on appearance from still frames and motion between frames.Two-stream convolutional neural network is widely concerned because it can capture spatio-temporal information.However,video contains noise,illumination changes and other factors,as well as long video behavior time will affect the accuracy of recognition.In this paper,based on two-stream convolutional neural network,aiming at the problems of behavior recognition,the following strategies are proposed to improve the recognition rate:(1)The scSE module is used to filter the image features,and a two-stream network model framework based on scSE is proposed.The model can focus on the information between channels,and give greater weight to the behavior characteristics,weakening the influence of background information.In this paper,the features processed by scSE are visualized and the results are analyzed.The experimental results show that the scSE module can focus on important features to improve the recognition rate of the network.(2)Based on the fusion scSE two-stream network model framework,the ’segmentation-fusion’ strategy is proposed,and the scSE_BNInception two-stream network is proposed by using BNInception network.The network can better deal with the recognition of long temporal video while filtering features.Firstly,the original video is divided into K temporal segments with equal length and no overlap.Then,RGB video frames and optical flow images are sparsely sampled from each segment and input into scSE BNInception two-stream network.Finally,the K segment recognition results are fused.Compared with the algorithms such as two-stream convolutional neural network and sequential segmentation network,scSE BNInception two-stream network improves the accuracy of behavior recognition under the premise of ensuring the running speed.(3)The ResNet101 is used to construct the two-stream model,and on this basis,the scSE convolution layer is added to filter the features.At the same time,the noise interference is reduced,and the nonlocal layer is added to pay attention to the long-distance dependence and obtain the global information.The SC_NLResNet two-stream network is proposed.Compared with scSE_BNInception and other algorithms on UCF101 and Hmdb51 datasets,the experimental results show that SC_NLResNet network can effectively improve the accuracy of video behavior recognition including noise,illumination change and other factors.Based on the basic framework of the two-stream network fused with scSE,the paper proposes that the scSE_BNInception network is proposed to improve the recognition rate of long-temporal video while maintaining the speed.SC_NLResNet network can better deal with the influence of noise change and illumination change.The recognition rates on UCF101 and Hmdb51 datasets were 96.4%,71.3%and 96.9%,76.2%,respectively. |