Font Size: a A A

Research On Multi-scene Person Re-identification Algorithm Based On Local Representation Learnin

Posted on:2024-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ChenFull Text:PDF
GTID:2568307106477924Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Person re-identification(Re ID)is a key sub-task in image retrieval,which adopts representation learning and similarity measurement to search persons in cross-device scenarios.Due to its great application prospects in ‘safe city’,‘intelligent monitor’ and other social security projects,person Re ID has become a research emphasis in the field of computer vision.In actual scenes,person data used for retrieval is diverse,including images,video sequences and text descriptions.Different algorithms need to be designed for corresponding challenges of these scenarios.In addition,global representation method is difficult to capture the key details of persons,resulting in weak feature discriminative ability and insufficient robustness.Therefore,based on local representation learning,this paper designed corresponding efficient Re ID methods for image,video and text-based retrieval scenarios.The details are as follows:(1)In image-based retrieval scene,for the feature domain gap caused by resolution differences,this paper proposed a multi-resolution and multi-granularity joint representation method.Specifically,this method designed an encoder-decoder based resolution reconstruction network,which can reconstruct the original image into high and low-resolution versions.Then the multiresolution representation and fusion network was introduced,adopting a multi-branch convolutional network where each branch fuses both global and local features.Therefore,this method can enrich the extracted discriminative information from multiple perspectives,while unifying the resolution scale of features.(2)In video-based retrieval scene,aiming at the local misalignment caused by the jitter of person bounding boxes,this paper proposed a reference-aided part-aligned feature disentangling method for video person Re ID.Specifically,this method first designed a pose-based reference feature learning network,using a pose estimation model to locate key points of the reference frame in each video sequence,which brings an alignment standard between videos.Then a relation-based local feature disentangling network was explored to reach intra-video alignment,which adopts attention mechanism to mine regions from each video with strong correlation.Thus this method can simultaneously achieve local alignments between and within videos.(3)In text-based retrieval scene,for the problem of complexity when the existing local-based methods adopt additional models to assist local matching,this paper proposed a local-aware textbased person search method.Specifically,a visual-guided textual local representation network was designed to filter local relevant textual features by reducing the domain gap between global textual features and local visual features.Then a multi-stage cross-modal matching strategy was proposed to conduct cross-modal feature projection from shallow,local and global levels,so that the domain gap can be reduced in a gradual manner.Therefore,this method can realize local cross-modal matching without auxiliary models.Finally,this paper also designed an improved version which enhances its local fusion and feature measurement,for further optimizing the performance.The above methods have been fully tested on mainstream datasets and achieved competitive accuracy in the same period.
Keywords/Search Tags:Deep learning, Person re-identification, Representation learning, Cross-modality
PDF Full Text Request
Related items