Font Size: a A A

Research On Human Action Recognition Method Based On 3D Convolutional Neural Network

Posted on:2021-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y X FanFull Text:PDF
GTID:2428330614461456Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Video-based human action recognition,as a hot research topic in the field of vision in recent years,is widely used in intelligent human-computer interaction and virtual reality,intelligent video surveillance and content-based video retrieval,smart medical treatment and nursing and other fields.However,how to extract more robust features from complex and changeable human action is a research difficulty in the field of action recognition under the real environment of cluttered background,occlusion and lighting changes.Traditional methods usually require manual design of features and rely on sufficient prior knowledge to achieve a high rate of action recognition.Thanks to the successful application of CNN in visual tasks such as image classification and target detection,many excellent deep learning methods are also gradually used in action recognition research,and some significant progress has been made.This thesis conducts an in-depth study of action recognition based on the 3D CNN architecture.The main work contents are as follows:(1)Due to the high complexity of the existing 3D CNN architecture,which makes it difficult to learn more rich and abstract deep features,a lightweight multi-scale convolution model is proposed.The model increases the local receptive field range in each layer of the network by embedding a lightweight multi-scale convolution module in the 3D convolution residual network.While significantly reducing the complexity of the model,it also extracts the multi-scale features of the target which significantly enhances the ability to represent the target.Finally,the channel attention mechanism is applied to the multi-scale features to extract key features.Experimental results show that the model in this paper not only has a high action recognition rate,but also has the advantage of reducing the complexity of the model.(2)Considering that the RGB image contains rich appearance information,which can describe the details and texture of human action well.While the Flow image contains important action information such as the speed and direction of the moving target.Therefore,an action recognition method based on multi-modal image input is proposed.The method generates intermediate images by fusing useful information in RGB and Flow images,and then forms multi-modal images with RGB images to increase network multi-source input.The time stage of fusing the two modal image features is studied to further improve the network performance.Experimental results show that this method is superior to other 3D CNN architecture methods in action recognition rate.
Keywords/Search Tags:Human action recognition, lightweight multiscale convolutional module, multiscale features, channel attention mechanism, multi-mode image
PDF Full Text Request
Related items