Font Size: a A A

Research On Black-box Adversarial Sample Generation Methods Applying Integrated Gradients And Generative Models

Posted on:2024-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z L WangFull Text:PDF
GTID:2568306932960929Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Adversarial examples are widely present in deep neural network models,posing a serious threat to the applications of AI technology in sensitive areas such as autonomous driving and security surveillance.By making slight modifications to the original data,adversarial examples can mislead the model into producing incorrect results without being detected by humans.Researchers are devoted to developing more advanced methods for generating adversarial examples to efficiently deceive models.This technology has significant implications in evaluating the security of network models and promoting the development of defense techniques.Currently,the research focus lies in black-box attacks,where the attacker only has access to the model’s predicted results.Black-box attacks can be categorized into two routes:transfer-based attacks and query-based attacks.Transfer attacks generally have low success rates,while query attacks require multiple model queries and lack sample stealthiness,making them susceptible to defense techniques.To address these challenges,this dissertation focuses on two technical routes of black-box attacks:improving the success rate of transfer attacks and enhancing the stealthiness of adversarial examples while minimizing the query requirements.The research content and innovative contributions of this dissertation mainly consist of the following three parts:(1)A transfer attack algorithm based on adaptive guided integrated gradients is proposed.Existing attack methods mostly focus on the entire image without considering the differential impact of different regions on the prediction results,which limits the transferability of adversarial examples across different models.This dissertation investigates how to improve the attention mechanism to design an attack method that can measure the importance of different features.Integrated gradient-based attention results often contain a considerable amount of irregular noise,which restricts attack transferability to some extent.This phenomenon is attributed to the straight-line integration path used in integrated gradients,which easily enters unrelated pixel regions,resulting in the generation of ineffective noise.To address this issue,this dissertation utilizes guided integrated gradients to optimize the integration path into an adaptive path and designs a transfer attack algorithm.Specifically,multiple fixed points are set within the path from the baseline to the input.The algorithm advances along the smaller gradient direction in the segments between adjacent fixed points.The algorithm incorporates strategies such as gradient clipping and momentum memory during the iterative attack process to optimize the direction smoothly.Experimental results demonstrate that this algorithm effectively improves the transferability of adversarial examples.(2)A transfer attack algorithm based on information entropy baseline and extended path is proposed.Integrated gradients,when applied from mathematical definitions to engineering applications,encounter two common problems:First,the choice of baseline,which refers to a reference point without any information,lacks a suitable baseline that can be applied to all models.Second,considering only the limited integration path from the baseline to the input overlooks effective gradient accumulation outside the path.To enhance the accuracy of the attention region generated by integrated gradients,this dissertation improves the integrated gradient method.Firstly,to address the first problem,an information entropy baseline is proposed,which measures the amount of information carried by different inputs in the perspective of the model.It is trained separately for different models to ensure that the predicted probabilities for all classes are similar,presenting a neutral baseline to the model.Secondly,to address the second problem,an extended integrated gradient method is proposed,which sets the farthest point as a saturation region and supplements effective gradient accumulation from the input to the saturation point on the defined integration path.Finally,this dissertation presents a transfer attack algorithm based on the information entropy baseline and extended path.The improved integrated gradients are used as the optimization direction,and gradient smoothing operations are combined to filter out noise,further enhancing the transfer attack performance of the algorithm.(3)A query attack algorithm based on variational autoencoder for distribution fitting is proposed.Query attacks have higher success rates compared to transfer attacks but require a large number of queries to support the attacks.The generated adversarial examples exhibit noticeable perturbation patterns in large-scale perspectives,which can raise suspicion from regulators.This issue is closely related to the deviation of adversarial examples from the original data distribution.Therefore,this dissertation investigates how to create adversarial examples with strong stealthiness from the perspective of data distribution.Specifically,the dissertation first trains a variational autoencoder on the original dataset as a tool for extracting intermediate distribution parameters.Then,a natural evolution strategy is employed to search for adversarial points in the vicinity of the distribution parameters.Finally,through operations such as decoding and clipping,the adversarial points are mapped back into the feasible domain to generate the final results.Experimental results demonstrate that this method can produce adversarial examples without noticeable perturbation patterns with a low number of queries.Even when the model employs various defense techniques to resist external attacks,this method achieves higher success rates and exhibits significant advantages compared to existing algorithms.
Keywords/Search Tags:Adversarial samples, Deep neural network, Transferability, Integrated gradients, Query-based attack
PDF Full Text Request
Related items