Font Size: a A A

Experiment Of Decoupled Operators On Two-Stream Convolutional Neural Networks

Posted on:2020-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhaoFull Text:PDF
GTID:2417330590482853Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Action recognition is a basic problem of computer vision.With the development of technology,this technique is now used in many places,such as intelligent video surveillance,virtual reality,video retrieval on the Internet,human-computer interaction and other scenarios,and has broad prospects.However,action recognition still has many problems and difficulties including how to extract powerful features,how to integrate multiple features and so on.These problems have affected the implementation of action recognition technique in engineering,so this paper will mainly discuss how to improve the accuracy and robustness of action recognition algorithms.With the success of convolutional neural networks in images in 2012,new network structures such as VGG and ResNet have emerged in recent years.There are currently many network techniques such as two stream neural networks and 3D convolution in video field.However,the current network feature expression ability is not strong enough,and the action recognition datasets are small compared with the ImageNet dataset,it is easy to over-fitting during the training process.In order to solve this problem,based on two researches' results,this paper will extend the decoupled operator to the deep-architecture two-stream convolutional neural network to discuss whether it can improve the feature expression ability.The dataset used in this paper is the UCF-101 dataset,which has 101 action categories for a total of 13,320 video clips.The VGG16 network structure is used to construct a temporal stream network and a spatial stream network.In training section,the ImageNet pretrain model are used as networks training initializations,and applying online enhancements to the data.Due to the usage of pre-training,the model uses a low learning rate,the temporal stream network uses an initial learning rate of 0.001,and decays by 1/10 every 100 iterations,and the spatial stream network uses a starting learning rate of 0.001,decays by 1/10 at 200 iterations and decays by 1/10 every 100 iterations.After training and testing,the traditional two-stream convolution network,the accuracy of spatial stream network is 73.619%,the accuracy of temporal stream network is 67.962%,The accuracy of the fused two-stream convolutional neural network is 77.134% Using the decoupled operator's two-stream convolutional network,the accuracy of the spatial stream network is 74.015%,the accuracy of the temporal stream network is 68.214%,and the accuracy of the fused two-stream convolutional neural network is 78.192%.
Keywords/Search Tags:action recognition, decoupled operators, two stream networks, convolutional neural networks
PDF Full Text Request
Related items