Font Size: a A A

Research On Lightweight Action Recognition Method In Edge Environment

Posted on:2024-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y D ZhaoFull Text:PDF
GTID:2568307157482464Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video action recognition has become one of the research hotspots in computer vision,and is widely used in public security,video review,human-computer interaction and other fields.Video combines the two dimensions of temporal and spatial information,the data is complex,and the inference model is usually very large.The demand for computing resources makes these methods mostly deployed on cloud servers.With the improvement of video resolution and the growth of video quantity,action recognition based on cloud computing is limited by insufficient bandwidth,network congestion and other issues,and it is difficult to meet real-time requirements.Edge computing transfers some services from the cloud centers to the edge nodes.It has the advantages of low latency and high security,and can well support real-time computing tasks.Compared with cloud servers with rich computing resources,the power,computing,storage resources and many other aspects of edge devices are severely limited.Therefore,it is of great significance to study lightweight action recognition methods.This paper launched on lightweight action recognition methods for edge resourceconstrained environments:(1)Enhanced temporal multimodal attention network for action recognition.Due to the complexity and time redundancy of video,dense sampling brings huge transmission costs and input costs.Common action recognition methods collect a small number of samples in video sequences,but these methods cannot balance long-term and short-term time information.Therefore,an enhanced temporal multimodal attention network is proposed,which can aggregate long-term and local temporal information.It mainly consists of two parts: local motion enhancement and difference time interaction.The local motion enhancement module combines the local motion features through image difference and temporal attention mechanisms.The differential time interaction module utilizes channel differences and pixel level spatial differences to efficiently learn spatio-temporal features.In this way,both short-term and long-term motion features can be aggregated.The module designed in this paper can be flexibly embedded into convolutional neural networks to achieve efficient spatio-temporal modeling.(2)Graph-based cross-modal knowledge distillation for edge action recognition model.As a precise motion description method,optical flow has played an important role in action recognition tasks.However,optical flow brings extra calculation and time consumption that cannot be ignored,and is difficult to apply to edge environments.Therefore,this paper proposes an edge action recognition method that replaces optical flow with residual frames of two-stream structure.Residual frames are simple to compute and require little computational resources.However,the motion information of residual frames is very blurry,and directly using them can degrade the accuracy severely.This article uses knowledge distillation to improve model performance and model compression.To solve the feature differences caused by the different input modes of optical flow and residual frame,this paper aims to achieve reasonable knowledge transfer by mining the relationship between features and establishing graph knowledge.Residual frames and model compression respectively reduce the amount of computation and power consumption in terms of data preprocessing and model inference.The experimental results show that this method reduces resource consumption and accelerates inference speed while maintaining good performance.
Keywords/Search Tags:Action Recognition, Attention Mechanism, Knowledge Distillation, Edge Environment, Lightweight
PDF Full Text Request
Related items