Font Size: a A A

Research On The Adversarial Attacks And Defenses Of Neural Network Classifier

Posted on:2022-08-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z B YiFull Text:PDF
GTID:1528307169477624Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Although deep learning technologies based on neural networks have made breakthroughs in many aspects of life,adversarial attack against neural network threatens reallife artificial intelligence applications.Artificial intelligence applications such as face recognition,autonomous driving,social comment detection have serious consequences if they are misled by adversarial examples.Therefore,it is necessary to investigate the threat of adversarial attack on image classifier and text classifier.It’s necessary to thoroughly study the technique of adversarial attack,so that the security risks can be prevented or removed in time.To improve the robustness of neural networks from adversarial attacks,thoroughly studying the defense of adversarial examples are needed.This thesis firstly studies the problem of adversarial attacks,starting from the field of image recognition,where adversarial attacks are most widely applied.Existing adversarial attacks on images require the details of the neural network or the training of a substitute network,the overhead of which is too high.In order to reveal the vulnerability of the black-box neural network that hides the details,it needs to be explored whether there is a more general and economical way to attack it.This thesis innovatively applies particle swarm optimization methods to solve the adversarial example attack problem.Experiments show that the black box adversarial attack based on particle swarm optimization can achieve higher attack success rate with less perturbation.To better investigate the impact of adversarial attacks on image and text classifiers,this thesis proposes a multimodal adversarial attack framework that can generate both text and image adversarial examples.In this framework,text and images are unified into a data structure called tensor.This thesis proposes an improved saliency map to measure the impact of pixels in images and words in text on classification results.The saliency map decides the modification priority of each pixel or word.Then candidate examples are generated according to the modification priority.Several swarm intelligence search methods(beam search,particle swarm optimization,genetic algorithm)are then used to find adversarial examples.Experiments show that this method works for both image and text attacks.The beam search has higher attack efficiency and the genetic algorithm has less time overhead.The existing text attacks usually have the problems of not being efficient and not being robust enough.High efficiency refers to the use of smaller modifications to achieve a greater success rate of the attack.Robustness refers to the ability of maintaining a good attack success rate even when attacking a defended classifier.This thesis innovatively proposes a new saliency map attack based on Levenshtein edit distance similarity network(SMAL)to improve the efficiency and robustness of text adversarial attacks.Experiments show that SMAL can achieve higher attack success rate with less edit distance,and SMAL can guarantee its attack effect when attacking defended classifiers.In order to detect image adversarial examples,this thesis proposes an adversarial example detection method based on incremental training of generative adversarial network(GAN).The discriminator in GAN is the main part that used in this method.Through the improvement of the discriminator,it can distinguish the normal examples and the adversarial examples.Through the incremental training of the discriminator,it can be used to detect the adversarial examples of new attacks.Another important issue is that the number of examples of the novel attacks are usually small,this thesis uses a data enhancement technique based on the Jacobian matrix to solve the problem of limited examples.Experiments show that this detection method can detect various types of adversarial attacks even when only a small number of adversarial examples are used for training.Current text defense methods have two main problems: they usually reduce the accuracy of classifiers and are not effective enough for defense.In this thesis,a stability fine-tuning framework is innovatively proposed to defend against word-level adversarial attacks while maintaining the classification accuracy on the original examples.Stability is measured by the probability change caused by small perturbated examples in a minor modification set.The smaller the probability change,the more stable it is.Stability ensures that a slight modification of the adversarial example does not cause too much change in the probability distribution.Also,the framework uses a new loss function,which is a trade-off between original loss and stability loss,to ensure both accuracy and stability.Experiments show that this defense method outperforms existing defense methods in terms of classifier accuracy,defense against attacks,and transferable adversarial example discrimination.
Keywords/Search Tags:Neural Network Classifier, Image Adversarial Attack, Text Adversarial Attack, Multimodal, Adversarial Example Detection, Stability Fine-tuning
PDF Full Text Request
Related items