Font Size: a A A

Research On Transferability Of Adversarial Samples In Deep Learning Models

Posted on:2023-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:S H HeFull Text:PDF
GTID:2558306620954709Subject:Domain software engineering
Abstract/Summary:PDF Full Text Request
With the great success of deep learning in various fields,the robustness and stability of deep neural networks(DNNs)have attracted more and more attention.However,recent studies have confirmed that almost all DNNs suffer from the adversarial example problem,which means that deploying DNNs in real life has significant security implications.At present,adversarial attacks have appeared in image recognition,natural language processing,object detection,and other fields.In image classification,adversarial attacks are mainly divided into Type I and Type II attack methods.Adversarial attacks can be divided into white-box attacks and black-box attacks according to whether the attacker can obtain model information.Currently,black box attacks can be divided into two types,query black-box attacks and metastatic black-box attacks(without query).For query-based black-box attacks,the attack’s success rate is low due to lack of prioritization,the inefficiency of query utilization,and limited feedback information.Most existing solutions are intensive query methods to obtain effective attack model information.However,it is obviously impossible to achieve multiple access to a model in real life.For transferable black-box attack methods,the mainstream approach is to generate adversarial samples by constructing an original model and using adversarial samples’ portability to attack unknown models.Because the source model and the unknown black-box model are very different,and the adversarial samples have the phenomenon of overfitting,it has been challenging to use the portability of adversarial samples to attack the black-box model.This paper’s primary purpose is to study the transferability of black-box attack methods.By exploring the features of the same image extracted by different models,this topic finds that although the shallow features extracted by different models are slightly different,the deep layers of the image are not the same.Features,different models still have high similarities.Therefore,this thesis mainly attacks the deep features of images to ensure that the features extracted by the model are wrong.At the same time,the transferability of adversarial examples can be improved.Our contributions are as follows.1.We propose a method to generate adversarial samples based on image deep feature tampering.Through the autoencoder structure and residual structure,the deep features of the image are extracted,modified,or retained to produce Type I and Type II adversarial samples.2.Black-box attacks are discussed for Type I and Type II attacks of adversarial attacks.The generation process of adversarial samples is modeled as image function mapping,and natural adversarial samples are generated by fitting data features through the neural network.3.A DFMA method based on a deep neural network is proposed.Although FGSM,Dee Fool and SVAE have a high attack success rate,the operation efficiency needs to be improved.In response to this problem,the DFMA method uses an adversarial network to automatically find sample features and tampering,improve the efficiency of the operation and reduce the time for generating adversarial samples.
Keywords/Search Tags:Black-box attacks, Deep features, Untargeted attack, Targeted attack, Transferable attack
PDF Full Text Request
Related items