| Person re-identification,one of the most well-liked research areas in the world of image retrieval,has many potential applications in intelligent security,unmanned supermarkets,pandemic prevention,and epidemiological studies.Person re-identification that is close to or can be used in practical application scenarios is called person re-identification in open-world.The representative tasks include unsupervised domain adaptation person re-identification,and visible thermal cross-modality person re-identification.For unsupervised domain adaptation tasks,the identity categories of the source domain dataset and the target domain dataset are independent of each other.Narrowing the difference between the source domain and the target domain is the key to solving this problem.The performance of unsupervised domain adaptation person re-identification has recently substantially improved with the development of clustering to generate pseudo-label approaches,but pseudo-labels are noisy.Therefore,how to produce higher-quality pseudo-labels has become a popular research direction for this task.For the visible thermal cross-modality person re-identification task,retrieving the same identity in the image gallery of both infrared and RGB modalities is challenging.Reducing modality differences and learning modality-sharing robust features are the key to solving this problem.Most of the cross-modality approaches now in use concentrate on the global features of the image,and few focus on local features.Consequently,a viable study area for visible thermal cross-modality person re-identification is using local features that include discriminative information.The primary study contents of this paper involve the two typical tasks in the openworld mentioned above and are as follows:(1)This research introduces the Transformer structure and suggests a ViT-based algorithm for the unsupervised domain adaptation person re-identification method of the conventional single-branch network structure.Considering that Transformer relies on pre-trained models,it is difficult to train from scratch,the number of parameters is large,and it is difficult to converge.To accelerate model learning,this work seeks to introduce the Swin Transformer,a more effective Transformer model.Inspired by this,this paper proposes a new solution to balance performance and efficiency-a simplified Transformer(Mini-tans Former,MF)structure/strategy that only includes a single block.MF can be trained from scratch without any pre-training parameters.More heads make it possible to capture diverse information comparable to the original ViT.This paper proposes a convolution-simplified Transformer framework(CNN with Mini-trans Former,CMF)combining the plug-and-play MF with the convolutional neural network.CMF has the advantages of both convolution and Transformer.In CMF,three distinctive MF strategies are presented in this paper: Fat,SAndwich,and Thin.The performance of four mainstream tasks on three commonly used datasets proves that the CMF method can significantly improve the performance of unsupervised domain adaptation person re-identification tasks.It is worth noting that CMF is also a general framework compatible with other vision tasks.(2)For the unsupervised domain adaptation person re-identification method of dualbranch network collaborative learning,this paper attempts to construct dual-branch networks with different structures to enhance their complementarity.There are two specific complementary learning schemes: One is three classic pooling methods,global average pooling,global maximum pooling,and generalized average pooling,which are used for the two networks,respectively.This simplest method brings significant performance improvement to the collaborative learning framework.The other is two networks employ the original ResNet50 or different CMFs,respectively,and the two networks with large structural differences are quite complementary.In addition,this study found that the performance improvement brought by the traditional pooling method is limited and that using Transformer in conjunction with global average pooling will result in performance degradation.To this end,this paper proposes two lightweight general pooling methods-Global Hybrid Pooling(GHP)and Global Sub-Value Pooling(GSVP),the former is based on a softer approach that preserves global information,while the latter,similar to human vision,preserves local discriminative information.Combined with CMF,this paper proposes a collaborative Transformer-pooling framework(CTP).On four mainstream tasks,CTP has achieved superior performance.It is worth noting that GHP and GSVP are also general pooling methods compatible with other convolutional neural networks and vision tasks.(3)Aiming at the visible thermal cross-modality pedestrian re-identification method under dark conditions,this paper attempts to introduce the Transformer structure.Specifically,to examine how well the approach suggested in this research performs in the supervised openworld person re-identification problem,CMF and global sub-value pooling are applied to this task.Additionally,the existing visible thermal cross-modality person re-identification tasks pay little attention to local features,and the existing local methods obtain local features by horizontally uniform partition,this method will make the adjacent local blocks’ Neighborhood information lost,which likely contains key discriminative features.To solve this problem,this paper proposes a simple and effective method-Adjacent local features(ALF),which can preserve pixel-level neighborhood information without adding computational complexity.The efficiency of ALF and the applicability of the pooling strategy suggested in this paper are demonstrated by the performance in two widely used datasets.In summary,it is crucial to study person re-identification in open-world to put reidentification technologies into actual production and benefit society.This paper takes unsupervised domain adaptation and visible thermal cross-modality person re-identification tasks as examples,aiming to find a series of general methods to solve the problem of person reidentification in open-world.Many experiments have verified the effectiveness and scalability of the method proposed in this paper. |