| With the development of computer hardware and software technology,a series of high-speed networks and new storage hardware technologies have emerged in recent years.On the network side,Remote Direct Memory Access(RDMA)technology has been able to achieve latency of less than two microseconds and message rates of up to two million per second.On the storage side,SSD storage hardware based on the Non-Volatile Memory Express(NVMe)host controller interface specification is capable of reaching800,000 IOPS.As an important component of various distributed systems,message middleware is widely used to decouple time,space,and process between distributed applications and to realize information transfer between modules.However,most of the existing message middleware is designed for traditional hardware,which cannot take full advantage of the technical features of new hardware,and there is a lot of unnecessary loss in the I/O path of network transmission and data storage,resulting in low latency performance.In this thesis,we design and implement a new hardware-oriented low-latency messaging middleware Thunder MQ,starting from the optimization of low-latency messaging middleware,and evaluate its overall performance from multiple perspectives through experiments.The main research contents and results are as follows:(1)Optimization of communication layer transmission performance.In order to improve the inter-module communication capability of the messaging middleware,this thesis proposes an RDMA communication framework based on a hybrid polling strategy,which has the advantages of fast response in polling mode and low CPU usage in eventdriven mode at the same time.Experiments show that this framework can significantly improve the throughput and reduce the latency of inter-module communication compared with the traditional TCP/IP network.This thesis proposes an improved SW-ICEEMDAN signal decomposition algorithm based on the sliding window mechanism to solve the endpoint effect problem of the original algorithm,and proposes a polling period adaptive control algorithm based on the retry mechanism and the SW-ICEEMDAN-ARIMA traffic prediction model to ensure that the communication layer can achieve reasonable CPU utilization under all kinds of network traffic loads.(2)Optimized transmission I/O path.In order to eliminate unnecessary overhead caused by synchronization and context switching between RDMA threads and storage,this thesis proposes a Run-to-completion thread model that merges RDMA network transfers with NVMe data storage threads based on the zero-copy principle,making full use of the multi-queue feature of NVMe storage hardware to effectively improve the processing efficiency during message processing and provide faster speed in message transfer and multi-copy synchronization compared to the same type of message queues.(3)Optimized memory management mechanism.In order to reduce the time overhead of memory registration during each RDMA transfer,this thesis proposes a CPU core-bound user-state high-speed memory pool based on the principle of locality.The memory pool is based on a hierarchical design,with the global memory pool providing resources for all CPUs and the local memory pool bound to CPU cores using the Buddy algorithm approach to cope with frequent requests for RDMA memory release,which further reduces communication latency. |