| Human action recognition has a wide range of applications in various disciplines,including intelligent security,human-computer interaction,and intelligent education.It is a key technique in the field of video understanding.Traditional action recognition technology must manually create the appropriate feature extraction method for each activity,which puts it under the impact of human factors and its own constraints and leads to poor generalization ability and low recognition performance.Deep learning algorithms based on three-dimensional bulk neural networks have achieved excellent results in the field of behavioral recognition in recent years,but the model structure is becoming increasingly complex,the number of parameters and calculations is increasing dramatically,increasing the model’s hardware requirements,and preventing behavioral recognition from landing in actual applications.Therefore,from the standpoint of model lightweight,this paper focuses on the issues of the quantity of parameters,large floating-point operations,inadequate extraction of temporal information,information loss and information redundancy caused by global average pooling in 3D convolutional neural networks,and the main research is as follows:1.To address three issues—a large number of parameters,insufficient temporal information extraction,and information redundancy—a lightweight human action recognition algorithm with fused attention is proposed.First,an Efficient residual block(ERB)is suggested to replace two cascaded 3×3×3 convolutional layers in order to reduce the network parameters and combine short-medium-long temporal information;Second,channel attention mechanisms were introduced and expanded,and a time-attention mechanism was proposed to embed both into algorithmic models in order to reduce the impact of redundant information on identification results.Finally,tests were performed on the UCF101 dataset to confirm the algorithm’s efficacy.In comparison to other action recognition methods,the results show that the proposed action recognition algorithm achieves a high recognition accuracy at a low cost in terms of parameters and computation,with 7.4M parameters,3.5GFl OPs of floating point operations,and 59.1% recognition accuracy without the use of pre-trained models.2.To address the information loss and information redundancy brought on by global average pooling,a human action recognition algorithm based on global frequency domain pooling(GFDP)is proposed(GAP).First,it establishes that GAP is merely a special case of DCT in the spectrum by using Discrete Cosine Transform(DCT)to analyze the root of the GAP issue from a frequency field perspective.Then,a global frequencies pooling is suggested to make up for the information lost in the GAP,boost the specificity between characteristic channels,and thereby lessen the information redundancy issue.Finally,to optimize algorithmic models,data enhancement and volume layer standardization strategies were introduced,and the standardization strategy was extended to the full-connected layer to reduce the risk of model over-adaptation.The experiment on the UCF101 dataset revealed that low-frequency segmentation had a more obvious complementary effect on information,and the proposed action recognition algorithm model had an identification accuracy of 63,0%,indicating a higher identification precision when compared to other action recognition methods. |