| Video Object Segmentation(VOS)is an important research topic in the field of computer vision,and has extremely important application value in intelligent video editing,autonomous driving,monitoring,and other aspects.In recent years,with the extensive practical application of deep learning in the field of computer vision,more and more video object segmentation algorithms based on deep learning have emerged as the times require.However,deep learning models are vulnerable to malicious adversarial attacks,which can mislead the model into making wrong decisions by adding adversarial perturbations to the input image that humans cannot perceive.The adversarial attack method can effectively reduce the classification accuracy of neural networks used for image classification.However,adversarial attacks on video object segmentation algorithms make it difficult to distinguish the object from the background,which poses a great challenge.Firstly,the existing adversarial attack methods for image classification are known object categories,requiring only classifier error classification.However,in VOS tasks,the category of objects is unknown,and the adversarial attack method using image classification directly is difficult to achieve effective attack performance.Secondly,frame by frame attacks on VOS require a large amount of time,making it difficult to quickly generate imperceptible adversarial perturbations that can be transmitted across frames,which causes efficiency challenges.The existence of adversarial examples poses a huge challenge to the robustness of video object segmentation algorithms.The threat to the model indicates that attention should be paid to the model security of video object segmentation algorithms based on deep learning.In response to this situation,this thesis discusses the security of existing video object segmentation algorithms from two perspectives: the attention based white-box attack method and the contrastive loss based black-box attack method.The research content of the thesis includes the following aspects:1.Aiming at the problem of unknown attack object categories,attention-guided white-box attack method for semi-supervised video object segmentation is proposed,which attempts to learn areas in the video where foreground object and background are confused,enhance attack performance,and solve the problem of large differences between the object and background regions.Firstly,the spatial attention module is used to capture globally dependent features,construct correlations between consecutive video frames,and perform multipath aggregation module to effectively fuse spatiotemporal perturbations,thereby reducing the feature differences between foreground objects and backgrounds.Secondly,guiding the deconvolution network to generate adversarial examples with strong attack ability,and designing a class loss function to train the deconvolution network can better activate noise in other background regions,and suppress activation related to the object class based on the enhanced feature map of the object class.At the same time,an attention feature loss mechanism is designed to improve the ability to adversarial attack transferability.This thesis has conducted experiments on two publicly available video object segmentation datasets,DAVIS 2016 and DAVIS 2017.The results show that the attention-guided adversarial attack method in the thesis can significantly reduce the segmentation accuracy of video objects.On DAVIS 2016,the J&F(mean value of region similarity J and contour accuracy F)decreased from 80.2 to 21.2,with a drop rate of 73.6%,and on DAVIS 2017,the J&F value decreased from 60.3 to 22.1,with a drop rate of 63.4%.The generated adversarial examples can also be transferable to other video object segmentation algorithms.2.Aiming at the problem of low efficiency in video data processing,a black-box attack method based on contrastive loss for self-supervised video object segmentation is proposed,which reduces the time overhead due to only generating perturbations on the obvious regions of the selected video frames.Firstly,the affinity matrix of the selfsupervised video object segmentation model is used to learn the feature representation of the video sequence,and the adversarial perturbation randomly generated by the noise generator are initialized.Secondly,considering the issue of consistency between video sequences,in subsequent iterations,the contrastive loss based on a single frame,arbitrary consecutive two frames,and multiple frames is optimized,and feature loss is used to enhance the transferability of adversarial examples.At the same time,pixellevel loss is used to make the iterative optimized adversarial video noise imperceptible.Finally,slight perturbations attack video object segmentation is achieved,which significantly destroys the predicted segmentation results of the video sequence.The experimental results show that the proposed black-box attack method has a strong attack efficiency,and the attacked self-supervised video object segmentation model decreases the J&F value by 44.4% on DAVIS 2016 and 50.1% on DAVIS 2017,while significantly reducing the segmentation accuracy of the self-supervised video object segmentation algorithm. |