| With the rapid development of technology and the growing wealth of the country,person re-identification provides a strong guarantee for maintaining public safety,such as monitoring and tracking of suspicious people in airports,railway stations,shopping centers,etc.However,it is realized that many person features cannot be effectively recognized under dark conditions,which seriously affects the accuracy of person re-identification,so crossmodal person re-identification has been proposed and widely researched.Cross modal person re-identification is a technology that tracks and identifies person between visible and infrared images.By comparing the similarity measures between the visible and the infrared images,it achieves accurate identification of personnel,and has received widespread attention and application in recent years.However,the existing cross-modal person re-identification methods have certain limitations.Many metric-based person re-identification methods rely heavily on the manual setting of margin hyperparameters for feature extraction,such as triplet loss,hard sample triplet loss,etc.In addition,most cross-modal person re-identification methods discard modespecific features,and achieve person re-identification by simply mining the person features shared between visible and infrared images.Therefore,this thesis addresses the above two shortcomings and conducts an in-depth study and improvement of cross-modal person reidentification based on the metric learning approach,as follows:(1)an adaptive global-local feature joint network for person re-identification is constructed,which effectively improves the characterization ability of person features by adaptively acquiring local features through spatial attention and then jointly learning the shared global pedestrian features.Then the proposed weighted regular heterogeneous center triple loss is optimized through an adaptive weighted regular approach by optimizing the margin hyperparameters to compute the center distance of different kinds of samples,which strengthens the constraint on feature centers.The proposed method achieves 75.10% Rank-1accuracy on SYSU-MM01 dataset and 89.0% Rank-1 accuracy on Reg DB dataset,which experimentally proves the effectiveness of the proposed method in this thesis.(2)A Transformer-based modal complementary cross-modal person re-identification method is designed,which uses Transformer as an auxiliary module to learn specific features of person and captures the long-distance dependencies between person pixels by Transformer.Then the designed feature-level modal compensation network generates the features of the missing specific modalities and finally the generated specific features are fused into the shared features and specific features to form a more robust pedestrian feature.The proposed method achieves 77.07% Rank-1 accuracy on SYSU-MM01 dataset and 86.30% Rank-1accuracy on Reg DB dataset,which experimentally proves the effectiveness of the proposed method in this thesis. |