Font Size: a A A

Research And Implementation Of The Short Video Highlight Detection Algorithm Based On Multi-Type Transformer

Posted on:2024-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2568306944970409Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The purpose of highlight detection is to identify attractive segments from a video.In traditional video types such as movies and sports events,this task is often performed manually.However,as an emerging video type that anyone can create and upload,short videos have become enormous.Therefore,in order to save labor input,scientists are exploring the use of video processing technology to automatically complete this task.The previous highlight detection methods have suffered from low detection accuracy,insufficient utilization of information,and narrow application scenarios.To address these issues,this thesis proposes a highlight detection algorithm based on information fusion.Firstly,to obtain global information and facilitate data exchange between visual and audio features,the algorithm utilizes a temporal feature fusion model composed of near,far and cross Transformer modules.Secondly,the algorithm both uses supervised and weakly supervised loss functions,allowing it applicable to scenarios with unlabeled data or segmented annotation.Besides,in the self-attention structure,the algorithm employs a smoother normalization function to balance the weight distribution among different positions.When facing frame-level application scenarios,direct division of the video into fixed-duration segments will lead to significant boundary errors in the output segments.To address this issue,this thesis proposes a highlight detection algorithm based on dynamic segmentation.Firstly,to introduce higher-precision temporal information,a fast pathway is constructed by stacking precision supplement modules,up-sampling modules and down-sampling modules.Secondly,to obtain frame-accurate segment boundaries,the algorithm predicts relative offsets and utilizes an endpoint offset loss function to fit the target output.Additionally,the algorithm employs a shot segmentation module to further eliminate prediction errors and alleviate visual discontinuity issues.In this thesis,a large number of comparative experiments and ablation studies are carried out on the above algorithms.The results demonstrate that the algorithm based on information fusion has advantages in accuracy compared with the existing methods,and the algorithm based on dynamic segmentation can significantly reduce the boundary error of the output segment.Finally,for the convenience of users,this thesis also designs a short video highlight detection system based on the above algorithms.
Keywords/Search Tags:short video, highlight detection, self-attention, dynamic video segmentation
PDF Full Text Request
Related items