Font Size: a A A

Research On Video Action Recognition Based On Deep Learning

Posted on:2020-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:M AnFull Text:PDF
GTID:2428330578966555Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The video action recognition technology uses the computers to analyze and identify the human actions in video sequence,which is a research hotspot in the field of computer vision.It has very important research significance and broad application prospect in the fields of intelligent monitoring,smart home,abnormal human behavior detection and human-computer interaction.Currently,the action recognition model proposed only uses video appearance information and short-term motion information,and lacks the ability to learn the dependence between long time series.Therefore,this paper further studies and improves the method of action recognition to make it more suitable for real life.To solve the problem of human action recognition in video,this paper uses two independent convolutional neural networks to extract the spatial and temporal information of video sequence respectively,and then combines the long short-term memory neural network(LSTM)to form Long-term Recurrent Convolutional Networks(LRCN)to recognize human action in video.LSTM unit is used to introduce the dependency between video sequences,so that LRCN network can process video sequences with long time structure.Experimental results show that the LRCN action recognition model has good robustness and generalization ability,and the application of LRCN model in power system is discussed in this paper.When the LRCN action recognition model processes long time video sequences,it adopts the dense sampling frame sequence strategy,which is easy to generate a large amount of redundant information and increase the network computing cost,and it cannot learn the remote time structure well of complex actions over long time.In this paper,the Temporal Segment Networks(TSN)with sparse sampling of the whole video was used as the basic model to replace the LRCN network for remote time modeling.TSN network combines temporal pyramid pooling method to form the TSN-TPP network to realize human action recognition.It is able to aggregate frame level features of multiple time scales into fixed length video level feature,which enhances the weak time structure in video.Experimental results show that this method can effectively improve the accuracy of action recognition.Finally,this paper also transplanted the action recognition model of LRCN network and TSN-TPP network to Jetson TK1,an embedded platform based on GPU acceleration,to realize human action recognition on the front-end devices and reduce the pressure on the server terminal to process a large amount of video data.
Keywords/Search Tags:action recognition, long-term recurrent convolutional networks, temporal segment networks, temporal pyramid pooling, embedded GPU
PDF Full Text Request
Related items