Deep Attention Based Cross-Modal Person Search Via Natural Language Descriptions

Posted on:2020-03-19

Degree:Master

Type:Thesis

Country:China

Candidate:S J Li

Full Text:PDF

GTID:2518306518464804

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the development of the society,much video surveillance is applied in maintaining the public society security.Although video surveillance provides many precious clues for the police,it spends a lot of time and manpower on looking for clues in videos manually.Therefore,an emerging technology to alleviate this problem,cross-modal person search via natural language description,aims to find the target person in different video surveillance via free-form natural language description queries.It is a quite challenging fine-grained cross-modal retrieval task.First,considering the diversity and redundancy of natural language description,we propose an attention mechanism based cross-modal fusion person search algorithm,which constructs a description-strengthened and fusion-attention network(DSFA-Net)for enhancement of the text description information.DSFA-Net aims to strengthen text description information by method of attention mechanism and cross-modal feature fusion,so as to make discriminative words more visually sensitive and establish close relations between important words and corresponding contents of the image.Second,we also propose an attention mechanism based multi-modal alignment person search algorithm as well.This algorithm designs an attention mechanism constrain model for person identity classification in order to guide the image and text to map into an appropriate common space.In addition,the algorithm puts forward a multi-modal alignment method.It generates fusion modality from image and text modalities to measure the distances among three modalities with a novel cross ranking loss,which makes different matching pairs separable in a common space.We call the network constructed by the above methods multi-modal alignment and attention network(MAA-Net).Extensive experiments on the popular and public CUHK-PEDES demonstrate the superiority of the both proposed approaches above.

Keywords/Search Tags:

Cross-modal person search, Attention mechanism, Cross-modal fusion, Natural language description

PDF Full Text Request

Related items

1	Research On Modality Consistency And Feature Robustness For Cross-modal Person Re-identificatio
2	Research On Cross-modal Fusion Between Vision And Language
3	Research On Text-based Person Search With Fine-grained Semantic Alignment
4	Jointly Cross-and Self-modal Graph Attention Networks For Query-based Moment Retrieval In Videos
5	Multi-branch Cross-Modal Person Reidentification Algorithm With Fused Attention Hash Coding
6	Research On Image-Text Cross-Modal Matching Based On Attention Mechanism
7	Research On Multi-modal Word Segmentation Method Integrating Speech Features
8	Research On Multimodal Sentiment Analysis Based On Cross-modal Fusion
9	Cross-Modal Summarization On Neural Network And Application On 3D Object Caption
10	Research On Cross-modal Person Re-identification Technology Based On Global And Local Feature Joint Learning