Research On Action Recognition Method Based On Key Frame And Attention Mechanism

Posted on:2024-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y K Zhang

Full Text:PDF

GTID:2568307115497884

Subject:Electronic Information (Computer Technology) (Professional Degree)

Abstract/Summary:

PDF Full Text Request

Human action recognition technology has consistently been popular research topics within the field of computer vision,it has important application value in the fields of intelligent video surveillance,smart elderly care,and human-computer interaction.The factor of intricated and diversity of human actions in videos,the complex background environment,and variations in lighting have brought significant challenges to human action recognition technology.The action recognition method based on traditional methods has high computational cost,weak generalization ability,and low recognition rates.Human action recognition method based on deep learning efficiently learn action features in videos using deep convolutional networks,with better recognition performance than traditional methods,but still face issues such as low recognition rates,insufficient feature extraction,and high computational costs.Therefore,we have carried out research on human action recognition method based on deep learning.The specific research contents are as follows:(1)Most of the existing action recognition methods use a strategy based on random sampling to extract a video frame sequence from the video and then feed them into the network for training.The extracted video frame sequences contain more redundant frames and lose some key information in the video,which makes the network model unable to learn sufficient features and affects the accuracy of action recognition.Therefore,we propose a video key frame extraction method based on image information entropy and HOG＿SSIM.Firstly,we calculate the image information entropy value of each video frame,and initially filter the key frames according to the local extreme value of the image information entropy value of the video frame sequence to get the candidate keyframes.Subsequently,the HOG＿SSIM similarity algorithm is applied to calculate the similarity between the candidate keyframes and the final video key frame sequence is obtained by the designed screening strategies.The effectiveness of the key frame extraction algorithm has been validated through a series of experiments on multiple video datasets,and the extracted keyframes that accurately represent the main content of the video.Moreover,the method achieves a high compression ratio and a low false detection rate.(2)In view of the existing three-dimensional convolutional neural network(3DCNN)model,which has problems such as large number of parameters,high computational complexity and insufficient feature extraction ability,we propose an action recognition method based on decoupled three-dimensional residual attention network.The proposed method takes keyframes as input,and firstly employs decoupled 3D convolutional modules to extract spatiotemporal features.Secondly,3D attention mechanism module is used to reweight the features of video key actions to enhance the extraction capability of key action feature.Finally,3D residual module is added to fuse high and low-order spatiotemporal features to alleviate problems such as gradient disappearance caused by deep networks.Several sets of experiments are conducted on multiple datasets(UCF101 and HMDB51)to demonstrate the feasibility and validity of the module in proposed method,which can effectively improve the performance of the network model,and provide better performance in action recognition tasks.Compared with the C3 D network,the parameters of our method is reduced by about 80%,and the accuracy is increased by 7.73% on the UCF101 dataset.(3)Aiming at the problems that the 3DCNN model only focus on local information in the process of feature extraction,the global dependencies between spatiotemporal features cannot be obtained,and the large number of parameters and calculations lead to high calculation costs,an action recognition method based on global spatiotemporal attention mechanism(GSTAM)and PCA＿3DNet is proposed.Firstly,the spatiotemporal feature information is extracted from the video sequence based on the pseudo-3D convolution structure;secondly,the channel attention mechanism(CAM)is added to promote the learning of key channel information by the network model,and construct a pseudo three-dimensional channel attention network(PCA＿3DNet);and finally the global spatiotemporal attention mechanism(GSTAM)is added to capture remote dependencies of spatiotemporal features,so that the model can capture more global information.The method has been validated through a series of experiments and analyses on several datasets(UCF101,HMDB51 and UCF11)from various perspectives.Compared with the C3 D network,the accuracy of our method is improved by 8.3%.Additionally,the number of model parameters and computation are only 1/6 and 1/9 of the C3D model,respectively.

Keywords/Search Tags:

action recognition, key frame extraction, three-dimensional convolutional neural network, attention mechanism, spatiotemporal feature

PDF Full Text Request

Related items

1	Research On Video Action Recognition Model Based On Convolutional Neural Network With Attention Mechanism
2	Research On Skeleton Action Recognition Algorithm Based On Spatiotemporal Attention Mechanism
3	Skeleton Action Recognition Study Basted On Collaborative Spatiotemporal Attention
4	Accurate Spatiotemporal Action Detection For Videos In Complex Scenes
5	Human Action Recognition Based On Convolutional Neural Network
6	Research On Traffic Flow Based On Spatiotemporal Feature Extraction
7	Human Action Recognition In Videos With Deep Learning
8	Video Action Recognition Based On Hybrid Attention Mechanism And Multi-scale Feature Fusion
9	Video Action Research Based On Attention Mechanism And Spatiotemporal Fusion Network
10	Research On Video Action Recognition Algorithm Based On Multi-scale Spatiotemporal Feature Extraction