Accurate Spatiotemporal Action Detection For Videos In Complex Scenes

Posted on:2024-08-04

Degree:Master

Type:Thesis

Country:China

Candidate:X Z Xu

Full Text:PDF

GTID:2568307100466144

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet and mobile devices,video has become a common carrier of information interaction.In the context of building "smart cities",reasonable and effective analysis and understanding of videos can not only explore the value of video information,but also have important significance in promoting high-quality development,enhancing the ability of comprehensive social governance,etc.One of the important research areas for building a "smart city" is spatiotemporal action detection.Therefore,spatiotemporal action detection has become one of the hot research topics in academia and industry.To address the problems in this area,two methods based on the deep learning theory is proposed,namely,the spatiotemporal action detection method based on the strategy of improved action tube connection and the spatiotemporal action detection method based on the global attention mechanism.The main research work is as follows.(1)To address the issue that previous studies ignore the large gap between the detection scores of different frames,based on the previous research,this paper proposes the spatiotemporal action detection method based on the strategy of improved action tube connection(IATC).The method mainly includes the branch of 2D convolutional neural network,the branch of 3D convolutional network,the block of channel fusion and attention,etc.A block for smoothing action tube connection is designed to solve the problem of large score gaps.By adding the proportion of different frame detection scores,the block can reduce the influence of excessive score gaps on detection results.On the UCF101-24 dataset,experimental results demonstrate that the proposed method improves the accuracy of spatiotemporal action detection.(2)To address the problem of insufficient feature description ability of 3D convolutional neural networks,this paper proposes the spatiotemporal action detection method based on the global attention mechanism(GAM-IATC).The method mainly contains the branch of 2D convolutional neural network,the branch of 3D convolutional neural network based on global attention,and the feature fusion block based on correlation coefficient matrix,etc.In the branch of 3D convolutional neural networks,the global attention block is added to enhance the feature description ability of the3 D convolutional neural network.In order to further promote the detection performance of the network model,the branch of 2D convolutional neural network,the boundary box loss function and the feature fusion block are improved.On the UCF101-24 dataset and the AVA dataset,the proposed GAM-IATC method achieves better experimental results than the baseline methods.In addition,the results of ablation experiments demonstrate that the proposed block can improve the performance of spatiotemporal action detection.

Keywords/Search Tags:

Spatiotemporal action detection, Action tube, Attention mechanism, 3D convolutional neural network, Channel fusion

PDF Full Text Request

Related items

1	Research On Skeleton Action Recognition Algorithm Based On Spatiotemporal Attention Mechanism
2	Video Action Research Based On Attention Mechanism And Spatiotemporal Fusion Network
3	Research On Video Action Recognition Model Based On Convolutional Neural Network With Attention Mechanism
4	Research On Action Recognition Method Based On Key Frame And Attention Mechanism
5	Skeleton Action Recognition Study Basted On Collaborative Spatiotemporal Attention
6	Temporal Action Localization And Action Recognition Based On Deep Learning
7	Research On Coarse-to-fine Action Understanding Technologies For Video
8	Human Action Recognition Based On Convolutional Neural Network
9	Research On Action Recognition Method Based On Graph Convolutional Neural Network
10	Human Action Detection Research Based On Convolutional Neural Network