Font Size: a A A

Deep Attention Based Cross-Modal Person Search Via Natural Language Descriptions

Posted on:2020-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:S J LiFull Text:PDF
GTID:2518306518464804Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of the society,much video surveillance is applied in maintaining the public society security.Although video surveillance provides many precious clues for the police,it spends a lot of time and manpower on looking for clues in videos manually.Therefore,an emerging technology to alleviate this problem,cross-modal person search via natural language description,aims to find the target person in different video surveillance via free-form natural language description queries.It is a quite challenging fine-grained cross-modal retrieval task.First,considering the diversity and redundancy of natural language description,we propose an attention mechanism based cross-modal fusion person search algorithm,which constructs a description-strengthened and fusion-attention network(DSFA-Net)for enhancement of the text description information.DSFA-Net aims to strengthen text description information by method of attention mechanism and cross-modal feature fusion,so as to make discriminative words more visually sensitive and establish close relations between important words and corresponding contents of the image.Second,we also propose an attention mechanism based multi-modal alignment person search algorithm as well.This algorithm designs an attention mechanism constrain model for person identity classification in order to guide the image and text to map into an appropriate common space.In addition,the algorithm puts forward a multi-modal alignment method.It generates fusion modality from image and text modalities to measure the distances among three modalities with a novel cross ranking loss,which makes different matching pairs separable in a common space.We call the network constructed by the above methods multi-modal alignment and attention network(MAA-Net).Extensive experiments on the popular and public CUHK-PEDES demonstrate the superiority of the both proposed approaches above.
Keywords/Search Tags:Cross-modal person search, Attention mechanism, Cross-modal fusion, Natural language description
PDF Full Text Request
Related items