Research On Attack And Defense Of Neural Network Model Based On Interpretability

Posted on:2023-08-26

Degree:Master

Type:Thesis

Country:China

Candidate:H H Liu

Full Text:PDF

GTID:2558306914479054

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Current deep neural networks(DNN)are easily fooled by adversarial examples,which are generated by adding some tiny,carefully crafted,invisible perturbations to clean examples.These malicious samples can mislead deep learning(DL)models into making wrong predictions without being noticed by humans.Once the deep learning model is attacked,it will cause huge economic losses and security problems,so the security of deep learning is a current research focused areas.At present,the main idea of white-box attack in the image field is to calculate the gradient and add global disturbance to the gradient direction,which has a high attack success rate.However,the global perturbation also has shortcomings such as excessive perturbation and easy detection by the human.In adversarial defense,feature squeezing is an adversarial example detection method,which identifies adversarial examples by comparing the difference between samples before and after squeezing,but,this method is slightly insufficient in defense ability against the attack of L0-norm.In view of the shortcomings of the above methods,this paper studies the algorithm efficiency,performance balance and local improvement of adversarial attack and defense.The main work is as follows:(1)A local white-box attack method based on interpretability is proposed.This method introduces the concept of interpretability in artificial intelligence,and uses the class activation mapping-based method Grad-CAM to provide visual interpretation results for model decisions,and optimizes the original gradient-based white-box attack methods.This method mainly adds perturbations to important areas of the image to form local adversarial examples.Using a variety of different attack methods and networks on the ImageNet2012 data set,compared with the global attack method,this method can reduce the perturbation size by 9%-24%on the basis of ensuring that the success rate fluctuates within 3%,The peak signal-to-noise ratio and structural similarity are also significantly improved.This method can have a high success rate with less affected pixels,and has a better attack effect on real images with complex textures.(2)A local feature squeezing defense method based on interpretability is proposed.The method is optimized for the deficiencies in the feature compression defense method,by using Grad-CAM to interpret the image,and then perform local feature squeezing for important areas of the image.This local squeezing method improves the discrimination rate of adversarial examples by increasing the difference between adversarial examples and clean examples before and after squeezing.Combined with the visualization results of the salience map,the method uses three feature squeezing methods to squeeze the features of the area of interest in the image,and calculates the norm difference before and after squeezing,so as to detect the adversarial examples.Experiments show that in the public image data set,the method improves the discrimination rate by about 20%compared with the existing feature squeezing method under various attacks against the L0-norm.When DNN makes a decision,different regions have different influences on the decision result.Therefore,this paper conducts interpretable operations on DNN.Use the visual salience map method to understand the decision-making basis within the network and the successful principle of adversarial examples.Combined with the interpretability of artificial intelligence,this paper proposes a local white-box attack method and a local feature squeezing defense method.

Keywords/Search Tags:

Deep Learning, Adversarial Examples, White-box Attack, Adversarial Defense, Interpretability

PDF Full Text Request

Related items

1	Research And Implementation Of Deep Learning Protection Technologies For Adversarial Examples
2	Research On Adversarial Attack And Defense For Image Classification
3	Research On Defense Methods Against Adversarial Attack Based On Deep Supervision And Noise Injection
4	Research On Adversarial Defense Robustness Of Deep Model
5	Research On Key Technologies Of Image Examples Adversarial Attack And Defense
6	Research On The Robustness Of Deep Image Classification Models Based On Adversarial Examples
7	Research On The Generation Of Adversarial Examples For Defensive Deep Learning
8	Research On The Interpretable Of Adversarial Examples For Deep Learning
9	Research On Adversarial Attack And Defense For Malware Detection Model
10	Research On Image Adversarial Algorithm Based On Deep Learnin