| Person re-identification(Re-ID)aims to retrieve a specific person from multiple disjoint cameras.Most of the past studies included a set of strong assumptions,that is,the images of person all contain a complete body torso,and all are in a single visible light(RGB)modality.In reality,the deployment position of the camera is fixed,and person are not restricted in the scene,so occlusion problems are easily generated during the shooting process.In addition,in order to improve the problem of poor shooting quality in low-light environments,an infrared(IR)mode camera is now additionally introduced,and the images captured by the two modes face the dilemma of matching failure on the traditional model.The above two realities are ubiquitous and unavoidable.Research on person Re-ID methods in two major scenarios of occluded and cross-modality has far-reaching significance in science and industry.There are two existing advanced methods for occluded person Re-ID.One is the two-stage method based on a pose estimation model.Although the influence of occlusion on feature extraction is reduced,the model of this method is relatively complex,and the existing pose estimation model may introduce additional bias.The second is the method based on attention mechanism.Although it can sense the visible area of pedestrians more accurately,the current research on the attention mechanism of this method is not deep enough,and the scale of generated person feature is single.The existing advanced cross-modality person Re-ID method is based on the modal shared feature learning route.and then uses the idea of feature mapping or modal disentanglement to learn the modal shared features.Although the research route of this type of method is clear,the existing models are generally complex,and the design ideas are difficult to understand,and there is a lack of an efficient and convenient method to process cross-modality person images.Aiming at the deficiencies of existing methods,this thesis conducts research on pedestrian re-identification methods in occluded and cross-modality scenes.The method proposed in this paper has achieved excellent performance in the traditional,occluded,and cross-modality person Re-ID datasets.The specific research contents are as follows:(1)A new person Re-ID network based on Transformer and multi-scale feature fusion was proposed.Based on the investigation of the relationship between the original attention layer and the input in the Transformer encoding layer,an adaptive person nuance interest search module was proposed.It was embedded in the aforementioned multi-scale fusion network to construct a richer multi-scale person feature representation than global features.The proposed approach demonstrated better performance in traditional person Re-ID tasks.(2)A Transformer-based multi-scale weakly-supervised occlusion-resistant person Re-ID network was proposed,which includes a shared non-occluded region locating module to generate cropped images of person visible regions.The module was embedded into the proposed network,using the raw feature maps collected in the Transformer encoding layer and weak supervision to generate rough localization of the visible regions of person.This generated more expressive multi-granularity person features,achieved advanced results on the Occluded-REID dataset and improved the accuracy of pedestrian matching in occluded scenes.(3)Combining with the modality-shared feature learning route,a Transformer-based cross-modality person Re-ID benchmark model is proposed,and a grayscale data enhancement strategy is proposed.The benchmark model utilizes specially designed independent-shared branches and multi-head self-attention mechanism to capture powerful person features for modality sharing.Data augmentation enhances the model’s ability to recognize modality-independent features from a data-driven perspective,further enriching the modality-shared features of person.The proposed method demonstrated excellent performance in cross-modality person Re-ID tasks. |