Structure And Semantics Guided Weakly Supervised Language Moment Localization

Posted on:2024-05-10

Degree:Master

Type:Thesis

Country:China

Candidate:K Qiu

Full Text:PDF

GTID:2568307079471874

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

With the development of mobile internet technology,the demand for quickly understanding video content is expanding.As a result,video understanding tasks have received more attention from researchers.Among video understanding tasks,the language moment localization task belongs to cross-modal high-level semantics,which aims to locate the specific time frame of relevant activities in the video based on a natural language description.Researches on this task can mainly be divided into fully supervised methods and weakly supervised methods.The former requires meticulous annotations of video data,which is excessively high in cost.Therefore,this thesis focuses on the weakly supervised language moment localization task.Previous weakly supervised models commonly adopt multiple instance learning methods and follow moment candidate selection pipeline.However,due to lack of the annotation of ground truth moment,these models suffer from the optimizing problems and are prone to falling into local minima during training,which is mainly manifested as the uncertainty of event temporal boundary and the incomplete semantic matching with the sentence.This thesis proposes a structure-and-semantics-guided pseudo-label-supervised-localization pipeline to alleviates the above problem faced by weakly supervised methods.Specifically,this thesis first proposes a matching score curve learning algorithm between video frames(in this thesis,video frames default to superframes,i.e.,a series of continuous frames in the video)and the sentence query based on video structural information,instead of directly learning the moment-sentence matching scores.This curve is used to generate pseudo-labels to supervise the localization network.Because the score curve contains the full extent of the video sequence,with its temporal content structure information,the proposed model can reduce the learning uncertainty and localize the moment with a fuller event process.Secondly,in order to achieve complete semantic matching with the sentence,this thesis proposes a semantic contrastive training strategy and a semantic prediction module to guide the model learning from the unmatched and matched video-sentence pairs respectively.In the semantic contrastive training strategy,this thesis constructs several contrastive samples containing both similar and different semantics to push the model accurately learn different semantics and make complete semantic matching,while the semantic prediction module achieves accurate visual-sentence alignment by restraining the activation of visual contents in the matched videos.This thesis conducts extensive experiments on Charades-STA and Activity Net-Captions datasets,and achieves optimal or suboptimal results compared to the state-of-the-art under multiple metrics of Rank n @ Io U=m.The code is publicly available at https://github.com/yetokun/WLML.

Keywords/Search Tags:

Weakly Supervised Language Moment Localization, Local Minima Problem, Pseudo Ground Truth Generation, Semantic Contrastive Training, Semantic Prediction

PDF Full Text Request

Related items

1	Research On Semantic Segmentation Method Based On Weakly Supervised Learnin
2	Weakly-supervised Semantic Segmentation With Pseudo Label Supervision
3	Design And Research Of Semantic Segmentation Algorithm For Weakly Supervised Image Based On Semantic Affinity Among Pixels
4	Research On Iteratively Learning Based Weakly Supervised Semantic Segmentation
5	Research On Weakly-Supervised Semantic Segmentation Based On Salient Information Fusion
6	Pseudo-training Method For Semi-supervised Semantic Segmentation
7	Weakly Supervised Image Semantic Segmentation Based On Local Region Growth And Faster R-CRR
8	Research On Weakly Supervised Image Visual Semantic Understanding Based On Deep Learning
9	Multimodal Fusion Based Weakly Supervised Semantic Segmentation Method
10	Research On Image Segmentation Method Based On Self-training And Contrast Learning