Font Size: a A A

Research On Modality Consistency And Feature Robustness For Cross-modal Person Re-identificatio

Posted on:2024-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:L B ShiFull Text:PDF
GTID:2568307109987959Subject:artificial intelligence
Abstract/Summary:PDF Full Text Request
Person Re-Identification(Re ID)is a technology that determines whether the pedestrian images captured from different camera viewpoints belong to the same person.Re ID is a challenging task due to variations in viewpoint,pose,lighting,and background.Most existing research focuses on matching visible light camera-captured pedestrian images,which is a single-modal Re ID problem.However,in intelligent surveillance systems,a visible light camera alone is not sufficient.When the lighting conditions are poor(such as at night),it is difficult to extract discriminative pedestrian information from visible light images.Advanced monitoring systems can automatically switch from visible light mode to infrared mode to capture pedestrian infrared images and obtain effective appearance information.Due to the different imaging principles,there is a significant modal difference between visible light and infrared images.Therefore,how to mitigate modal differences and extract rich pedestrian identity features is critical to improving the performance of cross-modal Re ID models.To mitigate the modality discrepancy and extract rich pedestrian identity features,our work in this paper is as follows:(1)A cross-modal person re-identification method is proposed,which consists of two modules: modality-invariant feature learning and consistent fine-grained information mining.Currently,most research focuses on matching visible light camera-captured person images,which is a single-modal person re-identification problem.However,in intelligent surveillance systems,visible light cameras alone are not sufficient as it is difficult to extract discriminative person information in low light conditions,such as at night.Advanced surveillance systems can switch to infrared mode to capture infrared images of persons,which provide effective appearance information.However,due to different imaging principles,visible light and infrared images have significant modal differences.Therefore,how to alleviate modal differences and extract rich person identity features are crucial to improving the performance of cross-modal person re-identification models.To this end,this paper proposes a new method that combines modality-invariant feature learning and consistent fine-grained information mining to obtain person features that are not affected by modal differences and more discriminative.Specifically,the method first uses the modality-invariant feature learning module to remove the modal information from the feature map to alleviate the impact of modal differences.Then,the consistent fine-grained information mining module performs channel grouping and horizontal partitioning on the feature map to fully mine discriminative fine-grained information and achieve semantic alignment.These two modules work together to enable the feature extraction network to obtain more discriminative features.Experimental results show that the proposed model outperforms the current state-of-the-art cross-modal person re-identification methods.(2)A cross-modal person re-identification method based on Transformer-based cross-modal information propagation and multi-head attention cooperation is proposed.Currently,the mainstream approach adopts a dual-stream network structure,which shares a portion of network parameters to map the feature maps of two modalities into the same feature space to mitigate modality differences.However,this method may discard useful pedestrian information while dropping modality information.In addition,mainstream methods ignore attention to local features of pedestrians,such as glasses and watches,as well as the correlation information between local features.These local features contain some potential correlations and contextual information,such as the overall appearance of the pedestrian,which is important for person re-identification.Therefore,a Transformer-based cross-modal information propagation and multi-head attention cooperation method is proposed to preserve modality information during pedestrian feature extraction,and then fuse the modality information of different modalities to achieve equivalent modality information,thus mitigating modality differences.The proposed method also designs a multi-head attention cooperation module,which can focus on local features and extract their correlation information to improve the accuracy of person re-identification.Compared to current mainstream methods that ignore local features,this method considers the importance of local features and can extract potential correlations and contextual information among them to further improve the performance of person re-identification.Experimental results show that this method performs well on two public datasets.
Keywords/Search Tags:cross-modal person re-identification, modal difference, fine-grained information, semantic consistency, multi-head attention cooperation
PDF Full Text Request
Related items