| The Internet of Things(IoT)is a comprehensive application that integrates various information technology achievements and is regarded as another information technology revolution after the Internet.It has great research value for industry transformation and societal development.In recent years,China’s IoT industry has expanded significantly,and COVID-19 has accelerated the in-depth development of IoT applications.Moreover,digital transformation and upgrading are driving the comprehensive integration of IoT with thousands of industries,making it an important part of new infrastructure.However,key technologies such as data security and privacy issues related to IoT are relatively scarce.To address these challenges,scholars have begun exploring solutions based on the blockchain,which provides advantages in terms of security,anonymity,and decentralization.Nevertheless,there remain problems such as low resource utilization of IoT node devices and difficulty in ensuring fairness and difference between nodes in the IoT.This thesis presents research and design of efficient blockchain consensus protocols based on multi-agent deep reinforcement learning for IoT scenarios,which aims to address these challenges and provide important theoretical support for accelerating the application of blockchain key technologies in IoT systems.First,this thesis aims to improve the efficiency and fairness of blockchain consensus protocols in IoT networks using reinforcement learning.An efficient consensus protocol based on reinforcement learning is proposed to improve the efficiency and fairness of consensus for miners in the blockchain system.The protocol is designed on the basis of the Proof-ofCommunication(PoC)scheme in a single-hop wireless network with unreliable communications.A distributed multiagent reinforcement learning(MARL)algorithm is introduced to tune actions and rewards in an actor-critic framework to seek effective performance.Empirical results from simulations show that the proposed algorithm guarantees the fairness of consensus and nearly reaches a centralized optimal solution.Secondly,given that malicious nodes can attack blockchain networks by deploying a large number of malicious nodes,it is necessary to increase the difference between normal nodes and malicious nodes to enhance the security,flexibility,and availability of consensus protocols in the Internet of Things.Thus,while ensuring high-efficiency performance,a weight mechanism is introduced before the consensus process to pre-limit the weight of each node.A weighted consensus algorithm based on reinforcement learning is proposed according to the weight mechanism.The convergent algorithm limits the proportion of each node participating in the consensus process to be consistent with the preset weight of the node.In this way,the impact of malicious behavior is reduced,the security of the blockchain system is enhanced,and the usability of more scenarios of the blockchain is further expanded.Simulation experiments show that in the case of preset weights,the algorithm ensures maximum consensus efficiency and increases the flexibility of the consensus protocol. |