| The intelligent surveillance system is a system that provides intelligent understanding and analysis of visual information.This system is widely used in the fields of intelligent security,intelligent transportation,etc.Person Re-IDentification(person ReID)is a key component of the surveillance system,which aims to match pedestrians with the same identity across non-overlapping cameras in the camera network,to provide clues for fast and efficient person tracking in large-scale surveillance networks.Existing studies mostly focus on the close-set datasets under relatively ideal conditions(i.e.,single environment,single modality,efficient labels).However,the real-world environment is frequently in the open-set datasets with more complex factors for person ReID,so the performance of traditional person ReID models is significantly degraded and cannot meet the requirements of the intelligent surveillance system.The complexities of person ReID research for open environments are reflected in single environmental complexity in fixed-view surveillance environments,crossdomain environmental complexity in cross-view surveillance environments,and unknown environmental complexity in cross-view surveillance environments.The complexities of these three aspects are specified as follows:(1)In a single complex environment,the obtainable target pedestrian images are often incomplete in terms of a person’s body information.The target pedestrians in the image are vulnerable to be occluded by other pedestrians or objects.It is not even possible to obtain a specific image,and we can only rely on the textual description of the target pedestrian by the eyewitness to find it.The incomplete person images result in the lack of robustness of existing methods.(2)In cross-domain complex environments,it is hard to obtain sufficient data with manual annotation in new environments.The cross-domain distribution discrepancies make the existing person ReID models have poor direct transfer performance,resulting in the poor adaptation ability to existing methods.(3)In unknown complex environments,it is difficult to collect corresponding images for training due to privacy issues involved in specific scenarios,resulting in poor generalization of existing methods.Urgent solutions to these problems of person ReID in an open environment are studied:For the problem of occluded/partial person ReID in a single complex environment,this thesis proposes a network named Pose-Guided Feature Alignment Learning with Knowledge Distillation(PGFL-KD).PGFL-KD intends to emphasize the features of visible body parts while excluding the interference of obstructions.Meanwhile,this method also encourages different channel groups to focus on different body parts to have body part semantics aligned representation.To get rid of the dependency on pose information and ensure high accuracy when testing,this method regularizes the main branch to learn the merits of pose-guided branches through knowledge distillation and interaction-based training,improving retrieval speed while ensuring retrieval accuracy.For the problem of the text-based person ReID in a single complex environment,this thesis proposes a novel hierarchical Gumbel attention network for text-based person search via Gumbel top-k re-parameterization algorithm.Specifically,it adaptively selects the strong semantic relevant image regions or words/phrases from images and texts for precise alignment and similarity calculation.This hard selection strategy can fuse the strong-relevant multi-modality features for alleviating the problem of semantic misalignment and matching redundancy.Extensive experiments on text-based person ReID benchmark demonstrate that our method performs favorably against the state-of-the-art methods.For the problem of domain adaptive person ReID in cross-domain complex environments,this thesis proposes the hard noisy label refinery method and soft noisy label refinery method.For the soft noisy label refinery method,this thesis proposes to estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels,by suppressing the contribution of noisy samples.This method haves observed that a sample with a wrong pseudo-label through clustering,in general,has a weaker consistency between the output of the mean teacher model and the student model.Based on this finding,this method proposes to exploit the uncertainty(measured by consistency levels)to evaluate the reliability of the pseudo-label of a sample and incorporate the uncertainty to re-weight its contribution within various ReID losses.For the hard noisy label refinery method,this thesis proposes a groupaware label transfer algorithm.Specifically,a label transfer algorithm simultaneously uses pseudo labels to train the data while refining the pseudo labels as an online clustering algorithm.More importantly,this method introduces a group-aware strategy to assign implicit attribute group identifications to samples.The combination of the online label refining algorithm and the group-aware strategy can better correct the noisy pseudo label in an online fashion and narrow down the search space of the target identity.For the problem of domain generalizable person ReID in unseen complex environments,this thesis proposes a simple yet effective calibrated feature decomposition module.Importantly,this method adopts the channel attention mechanism to subtly decompose person representation into a purer identity-relevant feature,domain features,and the remaining entangled one.To provide the more complete and calibrated person representation for feature decomposition module,a calibrated-and-standardised batch normalization is designed to jointly explore intra-domain calibration and inter-domain standardisation of multi-source domain features.Then,for enhancing the generalization ability and ensuring high discrimination of purer identity-relevant feature,a calibrated instance normalization is introduced to enforce discriminative id-relevant information and filter out id-irrelevant information.Extensive experiments demonstrate the strong generalization capability of our framework.This thesis focuses on exploring multi-domain and multi-modality person reidentification methods based on the above three complex scenarios,addressing the core problems of insufficient model robustness,adaptation,and generalization.This thesis explores the learning of invariant features in cross-domain and cross-modal data,building knowledge transfer bridges between different domains and modalities,reducing data distribution discrepancies,and promoting the development of person re-identification systems with high robustness,high adaptation,and strong generalization. |