Font Size: a A A

Research On Performance Optimization Methods For Kafka Message Systems

Posted on:2024-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y T HuangFull Text:PDF
GTID:2568307157980949Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,Kafka messaging systems are used to better handle massive amounts of data due to their advantages such as high throughput,low latency,and high error tolerance.However,with the continuous growth of data volume and the increasing number of high concurrency scenarios,the Kafka messaging system has encountered performance issues such as low throughput and high latency during operation due to improper configuration and skewed cluster node load.To solve the above issues,this thesis proposes two performance optimization methods: automatic configuration performance optimization and load balancing performance optimization.Firstly,in terms of automatic configuration,a good combination of adversarial networks and self-attention mechanism learning is generated to improve Kafka’s throughput and reduce latency after deploying appropriate configurations;Secondly,in terms of load balancing strategy,the most appropriate task distribution object in Kafka cluster servers is selected by constructing a performance comparison model based on random forest,so that the load of Kafka cluster is relatively uniform,thus improving the throughput of Kafka message system and reducing the delay.The specific research content and innovation points of this thesis are as follows:(1)An Automatic Configuration Tuning using Self-Attention and Generative Adversarial Network(ACT-SAGAN)algorithm is proposed to address the issue of low throughput and high latency when processing large amounts of data,as ordinary users do not have a deep understanding of Kafka configuration parameters and cannot configure them for specific application environments during use.Firstly,a self-attention mechanism is added into the generative adversarial network model to capture the correlation between hidden structures and configuration parameters from well configured combinations;Secondly,these hidden structures and associations are utilized to generate better configuration combinations,which can improve Kafka’s performance by deploying better configurations.This method reduces the number of system runs and does not require the establishment of a prediction model,resulting in a significant improvement in efficiency;Finally,the experimental results show that compared with Kafka in the default configuration,the throughput of this algorithm has increased by 78.60%,the average latency has decreased by 26.95%,and the maximum latency has decreased by 39.84%.(2)A load balancing performance improvement algorithm based on the Kafka performance comparison model is proposed to address the issue of Kafka’s inability to perceive the load status information in the cluster,resulting in data skewing when processing large amounts of data due to imbalanced load strategies,and high single point load resulting in low Kafka throughput and high latency.Firstly,the impact of CPU utilization,memory utilization,and disk utilization on Kafka’s performance are considered to handle new tasks based on the load status information of cluster servers;Secondly,a performance comparison model is established to predict the performance ranking of new tasks at each node;Finally,the server node with the highest ranking is selected to make the cluster load relatively uniform and suitable,which can improve Kafka’s performance.The experimental results show that compared to the default state of Kafka,using the servers selected by the algorithm in this paper for task distribution increases throughput by 57.32%,reduces average latency by 24.31%,and reduces maximum latency by 37.329%.
Keywords/Search Tags:Kafka, Performance optimization, Generative Adversarial Networks (GANs), Random forest, Load balancing
PDF Full Text Request
Related items