| Via the deep neural network,trackers take the advantages of the powerful feature extractor,the bounding box regression,and the mask prediction technology,achieving the significant breakthrough.However,the important challenge of distinguishing similar instances remains to be solved.The research of this dissertation is based on the deep neural network tracking methods,enhancing the separability of the different instances and the similarity of the same instance,in the aspects of feature fusion,feature extraction,and feature decoupling.In the aspect of the feature fusion,we propose the instance-based feature pyramid single object tracking method.This method fuses the deep and shallow layer features adaptively with the guidance of the instance information,which calculates correlation response layer-by-layer,and mixes the features from the different layers with a serial manner,achieving the enhanced instance features with the better separability and the higher localization precision.Specifically,the proposed method,taking the Siamese network tracking method as the baseline,integrates the instance-based upsampling module and the compressed space channel select module in it.The new method can take both advantages of the discriminative deep layer feature and the high-resolution shallow layer feature,improving localization precision,as well as keeping instance separability.In terms of the feature extraction,we propose the bilinear feature based single object tracking method.In this method,tracker extracts high-order features,which can provide more clues to enhance the detail information and improve the instance separability.Specifically,for speeding up the bilinear feature extraction,we formulize and analyze the bilinear feature encoding process,and design the lightweight self-bilinear and cross-bilinear encoders.The self-bilinear encoder studies the co-occurrence relationship between the different feature channels to improve the instance separability.The cross-bilinear encoder extracts the cross attention between the template and the search region,highlighting the foreground via the instance information,improving the classification reliability.In terms of the feature decoupling,we propose decoupled category awareness and instance awareness single object tracking method.This method integrates the contrastive learning which constructs inter-video sample pairs to exploit the ability of distinguishing the different instances.Specifically,the proposed method designs the category awareness module and the instance awareness module,which conducts the feature embedding in the category level and the instance level,respectively.The category awareness module learns the category information with the limited known categories and embeds the category information with the category encoder to stabilize the feature representation.The instance awareness module integrates the contrastive learning,mining inter-and intra-video instance discriminative information,and embeds the instance information with the instance encoder to improve the instance separability.In the multiple objects tracking task,we propose semantic and scene information based jointly detecting and multiple people tracking method,by extracting features in both category and instance levels with the explicit supervision signals,to enhance the discrimination of the instance feature.Specifically,via the effective detector-tracker interaction and the carefully designed trajectory management strategy,the proposed method manages the trajectory status jointly,to improve the localization precision and the trajectory continuity.We also design the scene viewpoint model,modifying the confidence scores of the predicted bounding boxes according to the location and the height.The scene viewpoint model can rule out some abnormal boxes to suppress the false positive.In several public benchmarks,the above methods improve the tracking performance effectively.The visualization results and ablation studies further demonstrate the effectiveness and robustness of the proposed methods. |