| As artificial intelligence models continue to proliferate in real-world applications,AI technologies,with deep learning as their flagship,have achieved remarkable breakthroughs in various domains.However,many deep learning algorithms were initially designed without adequate consideration of potential security threats,leading to significant security risks in the application of deep learning.One of the most notable threats is the vulnerability of deep learning to adversarial examples.This flaw enables malicious attackers to easily manipulate the decision outcomes of machine learning algorithms,resulting in misclassifications by Al systems.Adversarial attacks pose significant potential harm to deep learning systems,making the assurance of security and robustness of deep learning models increasingly urgent.In order to delve deeper into the intrinsic mechanisms of vulnerability exhibited by deep learning when confronted with adversarial examples,this paper conducts research at both ends of deep learning,namely,offense and defense.Focusing on this central theme,the paper undertakes comprehensive research in four directions:cross-model attacks,multi-task attacks,adversarial cleansing defense,and enhancing adversarial robustness.The aim is to provide new insights and solutions to address the challenging security issues posed by adversarial attacks in current deep learning.Specifically,the four research directions and their contributions in our works are as follows:(1)Addressing the limited transferability of adversarial examples due to existing attack methods considering only a single target model,this paper proposes an adversarial example generation approach based on a multi-agent adversarial generative network.By introducing multiple agent discriminators,the attack model can adapt to the decision boundaries of various target models,thereby providing richer gradient information for generating adversarial perturbations.To prevent the collapse of adversarial patterns in generative adversarial networks,we introduce a latent data distance constraint to enhance the consistency between latent adversarial samples and data adversarial samples.Experimental results demonstrate that the adversarial example generation method based on a multi-agent adversarial generative network exhibits superior attack transferability across different target models.(2)Addressing the challenge of determining optimization objectives in multitask attacks due to different target functions for different tasks,this paper introduces a novel multi-task adversarial attack paradigm,termed Multi-Task Adversarial Attack.Unlike traditional attack methods,multi-task adversarial attack does not rely on a specific task’s loss function or an attack agent model.Instead,it learns adversarial patterns by utilizing preserved relations representations.Specifically,we design a Relation Preservation Module,which maps samples into a low dimensional embedding space while preserving their intrinsic geometric structure to support the inference of adversarial patterns.This module’s function is to eliminate redundant information from high-dimensional features,providing an efficient latent space for the inference of adversarial patterns.To learn adversarial representations in the latent space,we introduce a novel adversarial mechanism that is not constrained by a specific task’s loss function or an attack agent.Extensive experimental results demonstrate that the Multi-Task Adversarial Attack outperforms current state-of-the-art general and transferable attack strategies.(3)In response to the challenges faced by existing defense methods,such as high cost,significant latency,and subpar defensive performance,this paper introduces a novel plug-and-play adversarial purification model known as the Diffusion Filter To effectively counteract Gaussian noise perturbations while preserving the genuine semantic information of input images,we introduce forward diffusion techniques and extend them to an infinite number of noise scales.This enables the distribution of perturbed data to evolve within an expanding range of noise through stochastic differential equations.In the reverse denoising process,we employ a score-based model learning approach to restore the input’s prior distribution back to the data distribution of the original clean samples,achieving more robust purification effects.Furthermore,to enhance the computational speed of the inverse process,we propose an efficient sampling method,significantly reducing the time cost required for purification.Experimental results demonstrate that the Diffusion Filter not only surpasses existing defense methods in robustness against strong adaptive attacks and query attacks but also achieves higher verifiable robustness compared to baseline methods.(4)Given the potential impact of adversarial training on model decision uncertainty,which in turn affects the reliability of people’s confidence scores in mode predictions,this paper first conducts empirical studies to discover that adversarial examples not only mislead undefended models in attack scenarios,making them overconfident in making incorrect decisions but also lead adversarially trained models to be more risk-averse.To simultaneously enhance the model’s adversarial robustness and confidence calibration performance,we propose an adversarial calibration entropy as a regularization technique for cross-entropy.Extensive experiments demonstrate that the adversarial calibration entropy regularization not only increases the model’s confidence in correct decisions but also performs on par with current state-of-the-art models in terms of adversarial robustness.This research provides an effective approach for optimizing both adversarial robustness and confidence calibration performance. |