Font Size: a A A

Safety And Security Analysis On Asynchronous Advantage Actor-Critic Model

Posted on:2023-04-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:T ChenFull Text:PDF
GTID:1528306845997409Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Due to reinforcement learning’s(RL)unique characteristics as autonomous learning,it significantly drives the development of artificial intelligence.With the rise of deep learning,the state-of-the-art RL has been dramatically improved and entered the era of deep reinforcement learning(DRL).Within various mainstream DRLs,asynchronous advantage actor-critic(A3C)leads to a promising DRL revolution motivated by parallel computing with its special asynchronous parallel framework.As more and more A3 C models are deployed in high-security sensitive application scenarios,the security related issues are becoming increasingly prominent,including the robustness threat of A3 C model,the adversarial attack threat,and the privacy reveal threat,etc.The existence of the security issues above have become the key barriers hindering the widely development of A3 C.Aiming at the security problem of A3 C model,by combining the characteristics of parallel computing in the asynchronous framework,this dissertation studies relevant security concerns of A3 C model.The main research content and contributions of this dissertation include the following three aspects:(1)Aiming at the weak decision-making phenomenon within the training process of A3 C model,a robustness assessment method of A3 C is proposed.In the training of A3 C model,a weak decision-making phenomenon may appear due to a variety of complex factors,such as improper learning rate setting,small number of agents,size of state space,and the distribution of initial state for agents training.In the view of such phenomenons,it is impossible to adopt existing robustness assessment methods for DRL models to measure the the robustness of the A3 C model,including the robustness assessment based on the neuron coverage,the robustness assessment based on model performance,etc.Thus,the A3 C robustness assessment has become one of the key security challenges in the field of DRL security research.In terms of the weak decision-making phenomenon within asynchronous framework,this work proposes two novel robustness assessment metrics,which are skewness and sparseness.Moreover,we respectively propose static-and dynamic-robustness assessment methods for A3 C model,via the state value extracting in the whole A3 C life cycle,it realizes the fine-granularity and process quantifiable A3 C model robustness assessment.(2)Aiming at the penetration testing of A3 C model,a retraining attack method in the view of the A3 C model’s security vulnerability is proposed.The retraining trigger vulnerability appears in A3 C model,which is caused by parallel computing in the asynchronous framework.Such vulnerability can be easily threatened by adversarial attacks,seriously impacting the security of A3 C model.However,recent studies in the field of DRL security research against adversarial attack,ignore the security analysis related to the retraining mechanism of A3 C model under the asynchronous framework,and hardly any research has been found.This dissertation focuses on the vulnerability analysis of the retraining mechanism of A3 C model under asynchronous framework,and proposes a retraining attack construction method,which utilizes the gradient band building and the exhaustive policy.We conduct penetration testing to sufficiently analyze the security vulnerability of A3 C model,and verify the existence of training attack vulnerability.Moreover,the present study also designs novel quantitative indexes of retraining attack effect,and offers several A3 C model defense suggestions against retraining attack.(3)Aiming at the hyperparameter privacy protection,the adversarial trajectory generation method in the view of the A3 C model reward function privacy reveal is proposed.Reward function is the critical hyperparameter in DRL model,which needs fine predesigns based on the expert’s knowledge and practical testing.The design of reward function is closely related to the DRL performance,and it belongs to the model’s privacy with high commercial vale.At present,in the research field of DRL model privacy protection in terms of reward function,most studies are only conducted from the perspective of the anti-attacker,ignoring the situation of causing catastrophic states which may greatly reduce the A3 C model performance.Thus,in the view of the above issue,this dissertation proposes an assessment method to measure the privacy reveal degree of A3 C reward function based on reward clustering mutual information.Meanwhile,in the view of catastrophic states avoidance,a trajectory generation method based on the critical states selection and the intrinsic fear model construction is proposed,to increase the reverse difficulty of hyperparameter,and to guarantee the privacy of model reward function to some extent.
Keywords/Search Tags:asynchronous deep reinforcement learning, inverse reinforcement, robustness assessment, retraining attack, privacy protection, adversarial trajectory
PDF Full Text Request
Related items