| Deep Neural Network(DNN)has achieved great success in computer vision,speech recognition,natural language processing and other aspects.However,due to its opacity,when users get decision results,they often cannot understand its prediction process,what effective features the model has learned,and how to make prediction and judgment.The whole process is lacking in explanation.The emergence of interpretable methods improves the output of the model,which "explains" the results of the decision.These interpretation regions,however,provide an advantage against attacks.In this thesis,we find that interpretable methods instinctively provide specific regions for the generation of confrontation samples.In addition,some interpretable methods are generated based on Saliency Map(SM),which quantifies each pixel in the region,and the value of which represents the degree of influence on the prediction results of neural network model.Therefore,it provides a new idea for the generation of counter samples by using the candidate region of constrainting the disturbance and further refining the position of each pixel by significance map.Based on the above investigation,this thesis conducts the following research on the security of interpretability in deep learning:(1)This thesis verifies the feasibility of using saliency graph to counter attack in white box environment.At the same time,considering the diversity of interpretable methods,this paper proposes a dynamic genetic algorithm to generate adversarial samples in black box environment."Dynamic" emphasizes the changing relationship between the number of disturbed pixels and the size of disturbed values.The optimal set of disturbed pixels can be found through gradual approximation,and the addition of disturbances needs the support of multiple rounds of genetic algorithm.Compared with the traditional genetic algorithm,the fitness function is improved in this thesis to guide the generation of adversarial samples through the change of interpretation region.Experimental results show that this method can deceive different neural network models with an average success rate of 92.88% in controllable time complexity.(2)In this thesis,the robustness of interpretable method is enhanced to counter the attack of the above image interpretation region.Firstly,according to the fact that the main curvature of the network model can smooth the saliency graph,the activation function Re LU is replaced by Softplus to effectively reduce the variation of the interpretation region.Secondly,the disturbance set generated by the antagonist is significantly destroyed by adding gaussian noise to the antagonist sample several times in advance using the idea of mean gradient.In the experiment,the influence of hyperparameters β,standard variance and sample number on the interpretation region was discussed in detail.The results showed that the SSIM value of the improved interpretation region increased by 84% and the MSE value decreased by 65% in the face of adversarial samples,which significantly enhanced the robustness of the interpretation region. |