| Person Re-Identification is an essential task that seeks to identify specific pedestrians across diverse scenes and cameras.However,several challenges make this task demand-ing,including illumination shifting,person occlusion and truncation,pose variation,scene transfer,and camera difference.Despite these difficulties,Person Re-Identification is in-dispensable in various fields such as security,autonomous driving,among others.Con-ventional person re-identification studies have concentrated primarily on visible scenes,paying little attention to low-light conditions in night-time surveillance systems.To over-come this limitation,researchers introduce Visible-Infrared Person Re-Identification.This technique enables cross-modality image retrieval between infrared and visible images,thereby enabling 24-hour suspect tracking.Nevertheless,eliminating the modality dis-crepancy remains a critical challenge.Algorithms for cross-modality person re-identification recently focuses on deep fea-ture alignment by designing dual-stream networks and metric loss functions,which help models construct a shared feature space.This approach reduces modality-specific features while increasing modality-shared features.However,the domain differences between in-frared and visible images primarily stem from differences in dense features.Mainstream methods only explicitly align deep features,lacking direct supervision of dense features.This paper identifies the lack of explicit supervision of dense feature alignment in dual-stream networks,making it challenging to effectively eliminate pixel-wise bias between modalities.Therefore,increasing explicit constraints is necessary.To address this re-search motivation,this paper proposes a dense feature alignment method based on con-trastive learning.Considering the lack of pixel-level supervision,it designs a weakly supervised semantic learning method based on class activation maps and an unsupervised alignment method based on image mixup.Two full-process paradigms including self-learning and self-constraint among dense features are presented.Main contributions are demonstrated as follows:(1)We introduce DCLNet,a contrastive learning based method for dense feature align-ment,which,for the first time in cross-modality person re-identification,introduces explicit constraints on dense features.As the dual-stream network lacks sufficient capability to eliminate modality-specific information with pixel-wise bias on dense features,we employ dense contrastive learning to explicitly constrain the distance between positive and negative samples in dense feature maps,reducing modality differences from a new perspective.Our method significantly outperforms state-of-the-art networks in terms of retrieval accuracy on the SYSU dataset.Extensive quantitative and qualitative experiments verify the effectiveness of our method in aligning pixels of the same semantic body parts across modalities.(2)We propose a weakly supervised semantic learning method based on Class Aacti-vation Map,which achieves weakly supervised semantic segmentation in Visible-Infrared Person Re-Identification for the first time.To address the challenge of sam-pling positive and negative pixel pairs due to the lack of pixel-level corresponding infrared and visible image pairs in datasets,we design a Part Aware Parsing module for seed region extraction and a Semantic Rectification Module for label propaga-tion and optimization,generating complete,smooth,and fine pseudo-masks.Ex-tensive visualization and ablation experiments verify the high quality of generated masks and the effectiveness of each module.(3)We introduce Mixup Net,an unsupervised alignment method that extracts positive and negative sample pairs without prior knowledge.Mixup Net incorporates im-age mixup technique to calculate similarity matrixes in the mixup image domain in order to derive positive and negative pixel pairs,which are then used to impose constraints on the original image domain,facilitating cross-domain mutual learn-ing.Our method outperforms state-of-the-art networks in terms of retrieval accu-racy on the Reg DB dataset.Mixup Net and DCLNet each have their own strengths and weaknesses. |