With the development of Internet technology and the popularization of smart terminal equipment,a large amount of video data is generated on the Internet platform every day.The content of these videos varies from good to bad,and may contain some illegal content such as violence and pornography,causing serious social impact.At the same time,some platforms hope to recommend related videos to different users based on video content.It is very inefficient to rely on manual processing of these video data.Therefore,video understanding based on deep learning has received more and more attention.Temporal action detection is a basic task in the field of video understanding.The purpose of this task is to detect the start time and end time of human actions in a continuous untrimmed video,and classify the action labels.Temporal action detection technology can be applied in many fields such as intelligent security,auxiliary medical treatment,video algorithm recommendation,etc.,and has broad application prospects.Therefore,the research on temporal action detection based on deep learning is of great significance.In recent years,research on temporal action detection based on deep learning has achieved great research results.However,there are still some problems in the current temporal action detection.In practical applications,the duration of different action instances in the video is often different and the gap is large.This time-scale problem is the main difficulty faced by the current temporal action detection.To solve the problem,this paper first introduces a temporal adaptive module instead of temporal convolution.This module can balance the short-term temporal information and long-term temporal information captured by the network by effectively combining temporal convolution and self-attention.This paper also introduces a channel adaptive module,which adjusts the channel weight adaptively according to the input features to guide the network to adapt to different action instances and tasks.By combining the temporal adaptive module and the channel adaptive module,this paper proposes a temporal adaptive feature pyramid network to generate multi-scale temporal features to deal with the time-scale problem of actions,which improves the flexibility of the network to generate temporal proposals.In addition,this paper further introduces a temporal deformable attention module,which can flexibly adjust the temporal receptive field of the network and help the network pay more attention to useful temporal information.By integrating the temporal deformable attention module with the temporal adaptive pyramid network,f the detection accuracy of the network is further improved.In order to verify the effectiveness of the improvement scheme proposed in this paper,this paper conducts experiments on the THUMOS14 dataset and Activity Net1.3 dataset.The results prove that the improvements proposed in this paper are effective and competitive with state-of-the-art methods. |