Font Size: a A A

Research On Model Stealing Technology Based On Deep Reinforcement Learning

Posted on:2024-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZhangFull Text:PDF
GTID:2568307067973339Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning is a technology that combines reinforcement learning and deep learning.In recent years,it has achieved rapid development breakthrough in the direction of artificial intelligence,and has been applied in various fields of society,such as financial transactions,autonomous driving,and network security.With the wide application of deep reinforcement learning,more and more researchers begin to study the safety of deep reinforcement learning technology.At present,studies have shown that deep reinforcement learning technology is vulnerable to the attack of adversarial examples attacks and easy to be attacked by model theft.Attackers can use model stealing technology to steal target models and cause loss of business value to model owners.Therefore,it is very important to protect models in deep reinforcement learning.From the attacker’s point of view,this paper studies the problem of model stealing in deep reinforcement learning,which has great research significance for building a safe and reliable deep reinforcement learning model.The existing model stealing technology for deep reinforcement learning has the problems of high training cost and high risk of stealing.In order to solve the above problems,this paper takes Deep Q Network(DQN)as the target model and steal the target model as the purpose,and proposes two kinds of stealing technologies of deep reinforcement learning model,including the model stealing technology based on Q value access and the model stealing technology based on Q value estimation:(1)A model stealing technique based on Q-value access.The purpose of this scheme is to steal the target model DQN.The implementation process of the scheme is to firstly steal the training data of the target model,secondly use the stolen training data to train the local surrogate model,and finally evaluate whether the performance of the local surrogate model is close to the target model.Considering the problems of high training cost and high risk of stealing in the existing model stealing technology,data screening is carried out in the stealing stage of the target model training data in this scheme,and the key data screening is completed by using the Q value access method.Therefore,the scheme uses a small amount of key data to complete the stealing of the target model,which reduces the cost of model stealing.(2)Model stealing techniques based on Q-value estimation.This scheme is an improvement on the first scheme,and is optimized for the problems of high storage cost,low data screening efficiency,and low concealment in the first method.The implementation process of this scheme is to first train the data selection model,then use the data selection model to select key data,then use the key data to train the local surrogate model,and finally evaluate the performance of the local surrogate model.Compared with the model stealing technique of Qvalue access,the model stealing scheme of Q-value prediction uses the data picking model to select the key data,which not only avoids the storage of unimportant data in the process of data picking,but also avoids frequent access to the Q network of the target model.Therefore,this scheme improves the stealing efficiency and stealing concealment of the model stealing technology.
Keywords/Search Tags:Deep Reinforcement Learning, Model Stealing, Model Security
PDF Full Text Request
Related items