Font Size: a A A

Research On Action Recognition Method Based On Key Frame And Attention Mechanism

Posted on:2024-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y K ZhangFull Text:PDF
GTID:2568307115497884Subject:Electronic Information (Computer Technology) (Professional Degree)
Abstract/Summary:PDF Full Text Request
Human action recognition technology has consistently been popular research topics within the field of computer vision,it has important application value in the fields of intelligent video surveillance,smart elderly care,and human-computer interaction.The factor of intricated and diversity of human actions in videos,the complex background environment,and variations in lighting have brought significant challenges to human action recognition technology.The action recognition method based on traditional methods has high computational cost,weak generalization ability,and low recognition rates.Human action recognition method based on deep learning efficiently learn action features in videos using deep convolutional networks,with better recognition performance than traditional methods,but still face issues such as low recognition rates,insufficient feature extraction,and high computational costs.Therefore,we have carried out research on human action recognition method based on deep learning.The specific research contents are as follows:(1)Most of the existing action recognition methods use a strategy based on random sampling to extract a video frame sequence from the video and then feed them into the network for training.The extracted video frame sequences contain more redundant frames and lose some key information in the video,which makes the network model unable to learn sufficient features and affects the accuracy of action recognition.Therefore,we propose a video key frame extraction method based on image information entropy and HOG_SSIM.Firstly,we calculate the image information entropy value of each video frame,and initially filter the key frames according to the local extreme value of the image information entropy value of the video frame sequence to get the candidate keyframes.Subsequently,the HOG_SSIM similarity algorithm is applied to calculate the similarity between the candidate keyframes and the final video key frame sequence is obtained by the designed screening strategies.The effectiveness of the key frame extraction algorithm has been validated through a series of experiments on multiple video datasets,and the extracted keyframes that accurately represent the main content of the video.Moreover,the method achieves a high compression ratio and a low false detection rate.(2)In view of the existing three-dimensional convolutional neural network(3DCNN)model,which has problems such as large number of parameters,high computational complexity and insufficient feature extraction ability,we propose an action recognition method based on decoupled three-dimensional residual attention network.The proposed method takes keyframes as input,and firstly employs decoupled 3D convolutional modules to extract spatiotemporal features.Secondly,3D attention mechanism module is used to reweight the features of video key actions to enhance the extraction capability of key action feature.Finally,3D residual module is added to fuse high and low-order spatiotemporal features to alleviate problems such as gradient disappearance caused by deep networks.Several sets of experiments are conducted on multiple datasets(UCF101 and HMDB51)to demonstrate the feasibility and validity of the module in proposed method,which can effectively improve the performance of the network model,and provide better performance in action recognition tasks.Compared with the C3 D network,the parameters of our method is reduced by about 80%,and the accuracy is increased by 7.73% on the UCF101 dataset.(3)Aiming at the problems that the 3DCNN model only focus on local information in the process of feature extraction,the global dependencies between spatiotemporal features cannot be obtained,and the large number of parameters and calculations lead to high calculation costs,an action recognition method based on global spatiotemporal attention mechanism(GSTAM)and PCA_3DNet is proposed.Firstly,the spatiotemporal feature information is extracted from the video sequence based on the pseudo-3D convolution structure;secondly,the channel attention mechanism(CAM)is added to promote the learning of key channel information by the network model,and construct a pseudo three-dimensional channel attention network(PCA_3DNet);and finally the global spatiotemporal attention mechanism(GSTAM)is added to capture remote dependencies of spatiotemporal features,so that the model can capture more global information.The method has been validated through a series of experiments and analyses on several datasets(UCF101,HMDB51 and UCF11)from various perspectives.Compared with the C3 D network,the accuracy of our method is improved by 8.3%.Additionally,the number of model parameters and computation are only 1/6 and 1/9 of the C3D model,respectively.
Keywords/Search Tags:action recognition, key frame extraction, three-dimensional convolutional neural network, attention mechanism, spatiotemporal feature
PDF Full Text Request
Related items