| Person re-identification is an important topic for understanding video in surveillance scenarios,aiming to match the same person in the image gallery set and query set images,thus enabling cross-scene person retrieval.Therefore,person reidentification plays an important role in intelligent security,local stability maintenance,and combating criminals.In recent years,the all-weather deployment and operation of video surveillance equipment have made it possible for long-term monitoring.In addition,cameras that can simultaneously obtain daytime visible and nighttime infrared images have provided effective equipment support for all-weather video surveillance.However,conventional person re-identification methods can only handle visible images,which is difficult to solve the differences between visible and infrared modalities,as well as the similarity within modalities.Therefore,the cross-modality person reidentification problem between visible and infrared images has received increasing attention in the academic community.In this thesis,cross-modality person re-identification between visible and infrared images is studied.The main work is as follows:(1)To solve the problem that the existing methods tend to make the model fall into local overfitting,a cross-modality person re-identification method based on modality consistency network is proposed.The spatial information of the whole pedestrian area is mined by enlarging the perception range of the convolution layer,and the channel information with richer semantic clues is mined by using the attention channel aggregation block.Finally,a modality consistent regularization method is used to reduce the difference of higher-order features between heterogeneous images.The proposed method has been validated on the visible infrared person re-identification dataset and achieved good performance.The Rank-1 metric reached 91.21% and the m AP metric reached 81.61% in the visible query infrared modality of the Reg DB dataset,and the Rank-1 metric reached 56.72% and the m AP reached 55.88% in the global search mode of the SYSU-MM01 dataset.(2)The existing pre-trained models ignore the potential correlation between different positions and channels in a single sample during feature extraction.This thesis proposes a discriminative feature learning network based on visual transformers to explore the potential correlation between different positions and channels in a single sample and proposes a triplet-aided heter-center loss to learn more discriminative feature representations by balancing the cross-modality distance and intra-modality distance of the center.This method has achieved high accuracy on existing datasets.The Rank-1 metric reached 92.10% and the m AP metric reached 82.11% in the visible query infrared modality of the Reg DB dataset,and the Rank-1 metric reached 59.84%and the m AP reached 57.70% in the global search mode of the SYSU-MM01 dataset.(3)To address the issues of insufficient representation of discriminative information in the backbone network and misaligned person parts,this thesis proposes a cross-modality person re-identification method based on graph aggregation learning,which utilizes the correlation modeling ability of graph convolutional networks to preserve the underlying discriminative features and supplement them into the final person feature representation.In addition,in order to improve the discriminability of part features,this article designs a part attention aggregation module that utilizes selfattention mechanism to mine part relationships and achieves good accuracy in comparison and ablation experiments.The Rank-1 metric reached 87.48% and the m AP metric reached 75.59% in the visible query infrared modality of the Reg DB dataset,and the Rank-1 metric reached 58.76% and the m AP reached 57.67% in the global search mode of the SYSU-MM01 dataset.In summary,this thesis focuses on the issue of intra-modality differences and intramodality similarities in the field of cross-modality person re-identification.It addresses the prominent problem of deep semantic feature extraction methods and solves the difficulty of extracting discriminative features between two modalities.The accuracy of person re-identification in cross-modality scenes is improved,which has high theoretical significance and practical value for the development of person reidentification. |