Font Size: a A A

Accurate Spatiotemporal Action Detection For Videos In Complex Scenes

Posted on:2024-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:X Z XuFull Text:PDF
GTID:2568307100466144Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and mobile devices,video has become a common carrier of information interaction.In the context of building "smart cities",reasonable and effective analysis and understanding of videos can not only explore the value of video information,but also have important significance in promoting high-quality development,enhancing the ability of comprehensive social governance,etc.One of the important research areas for building a "smart city" is spatiotemporal action detection.Therefore,spatiotemporal action detection has become one of the hot research topics in academia and industry.To address the problems in this area,two methods based on the deep learning theory is proposed,namely,the spatiotemporal action detection method based on the strategy of improved action tube connection and the spatiotemporal action detection method based on the global attention mechanism.The main research work is as follows.(1)To address the issue that previous studies ignore the large gap between the detection scores of different frames,based on the previous research,this paper proposes the spatiotemporal action detection method based on the strategy of improved action tube connection(IATC).The method mainly includes the branch of 2D convolutional neural network,the branch of 3D convolutional network,the block of channel fusion and attention,etc.A block for smoothing action tube connection is designed to solve the problem of large score gaps.By adding the proportion of different frame detection scores,the block can reduce the influence of excessive score gaps on detection results.On the UCF101-24 dataset,experimental results demonstrate that the proposed method improves the accuracy of spatiotemporal action detection.(2)To address the problem of insufficient feature description ability of 3D convolutional neural networks,this paper proposes the spatiotemporal action detection method based on the global attention mechanism(GAM-IATC).The method mainly contains the branch of 2D convolutional neural network,the branch of 3D convolutional neural network based on global attention,and the feature fusion block based on correlation coefficient matrix,etc.In the branch of 3D convolutional neural networks,the global attention block is added to enhance the feature description ability of the3 D convolutional neural network.In order to further promote the detection performance of the network model,the branch of 2D convolutional neural network,the boundary box loss function and the feature fusion block are improved.On the UCF101-24 dataset and the AVA dataset,the proposed GAM-IATC method achieves better experimental results than the baseline methods.In addition,the results of ablation experiments demonstrate that the proposed block can improve the performance of spatiotemporal action detection.
Keywords/Search Tags:Spatiotemporal action detection, Action tube, Attention mechanism, 3D convolutional neural network, Channel fusion
PDF Full Text Request
Related items