| Object tracking is one of the fundamental research problems in computer vision,which has been widely applied in intelligent video surveillance,human-computer interaction,military reconnaissance,and other fields.The goal of object tracking is to locate and track targets across time and space in single-camera or cross-camera scenarios.This has significant research and practical value for the development of public security and smart cities,among other domains.Although object tracking has made great progress,the challenges of illumination changes,camera movement,background noise,and image quality degradation in real scenes make the problem of object tracking in complex scenes still unsolved.This thesis believes that the main reason for model performance degradation in complex scenes is caused by the model’s insufficient ability to represent target features.Therefore,this thesis discusses the problems of visible-infrared object tracking(single-camera scene)and nighttime crosscamera object tracking(cross-camera scene)from the perspective of deep feature enhancement,and investigates the modality decoupling enhancement model for visible-infrared object tracking,duality-gated enhancement model for visible-infrared object tracking,illumination enhancement fusion model for nighttime cross-camera object tracking,and multi-domain joint enhancement model for nighttime cross-camera object tracking.To address the problem that mainstream visible-infrared object tracking methods usually deal with each modality independently and under-model mode-sharing features,this thesis proposes a visible-infrared object tracking method based on the modality decoupling enhancement model.This method designs three adapters to jointly extract modality-shared,modality-specific,and instance-aware object features in an end-to-end deep learning framework.To achieve low-computation-cost modal feature decoupling and enhancement learning,the thesis shares most of the parameters for collaborative learning of modality-shared features,while using few parameters to model modality-specific features separately.In order to further improve the decoupling learning effect,this paper also designs a hierarchical divergence loss based on the multi-kernel maximum mean discrepancy.Experimental results show that the method in this paper has achieved good results on three mainstream visible-infrared target tracking datasets.Existing visible-infrared object tracking methods often use modality quality weight strategies to suppress low-quality modality features,which limits the full utilization of discriminative information from all modalities.To address this issue,this thesis proposes a duality-gated enhancement model for visible-infrared object tracking.The method constructs a feature enhancement network to enhance the discriminative feature representation of all modalities while suppressing irrelevant noise.Specifically,the thesis designs a feature enhancement module that uses discriminative information from one modality as conditional features to guide the feature learning of another modality,while executing mutual guidance between the two modalities to cope with the dynamic changes of the modal quality in complex scenes.Then,the thesis designs a duality-gated mechanism and integrates it into the feature enhancement module to improve the quality of generated conditional features and reduce the influence of data noise.In addition,an optical flow-based resampling strategy is designed to improve the robustness of the model under camera motion challenges.Finally,a large number of experiments have been carried out on four standard visible-infrared target tracking datasets,and the results show that the method in this paper has significant advantages over mainstream methods.To address the challenge that existing cross-camera object tracking methods are difficult to handle in complex lighting conditions at night with only daytime scene data training,this thesis proposes a nighttime cross-camera object tracking method based on an illumination enhancement fusion model.The method improves the performance of existing crosscamera object tracking methods in low-light environments through low-illumination image enhancement techniques and fusion enhancement strategies.Specifically,the model consists of a master branch,an illumination enhancement branch,and a fusion enhancement module.Firstly,the master branch and illumination enhancement branch are used to extract the night features and enhancement features from the night images,respectively.Then,the bottleneck structure-based fusion enhancement module is utilized to achieve sufficient fusion of the night features and enhancement features,and suppress data noise.Additionally,a nighttime crosscamera object tracking dataset named Night600 is contributed in this thesis,which contains pedestrian images captured from different perspectives and lighting conditions in complex night environments.Finally,extensive experiments on Night600 and an existing night crosscamera object tracking datasets demonstrate the superiority and generalization ability of the proposed method.To address the problem that existing nighttime cross-camera object tracking methods combine low-light image enhancement network and cross-camera object tracking network in a serial manner,which makes the performance of cross-camera object tracking limited by the quality of enhancement results,this thesis proposes a night cross-camera object tracking method based on a multi-domain joint enhancement model.To avoid the impact of the enhanced image quality on the cross-camera object tracking task,this method designs a parallel network based on Transformer to perform cross-camera object tracking and low-light image enhancement task simultaneously.In this paper,the effective enhancement of crosscamera object tracking features is achieved by sharing the low-level network parameters and adding a one-way connection design between high-level peer networks.To address the poor learning performance of the model using small-scale real nighttime data,this paper also proposes a multi-domain learning scheme that optimizes the entire model by alternately sampling real and synthetic domain data in each iteration.Experimental results show that our method achieves effective improvements on one real and two synthetic nighttime cross-mirror object tracking datasets. |