Research On Performance Optimization Methods For Kafka Message Systems

Posted on:2024-08-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Huang

Full Text:PDF

GTID:2568307157980949

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In the era of big data,Kafka messaging systems are used to better handle massive amounts of data due to their advantages such as high throughput,low latency,and high error tolerance.However,with the continuous growth of data volume and the increasing number of high concurrency scenarios,the Kafka messaging system has encountered performance issues such as low throughput and high latency during operation due to improper configuration and skewed cluster node load.To solve the above issues,this thesis proposes two performance optimization methods: automatic configuration performance optimization and load balancing performance optimization.Firstly,in terms of automatic configuration,a good combination of adversarial networks and self-attention mechanism learning is generated to improve Kafka’s throughput and reduce latency after deploying appropriate configurations;Secondly,in terms of load balancing strategy,the most appropriate task distribution object in Kafka cluster servers is selected by constructing a performance comparison model based on random forest,so that the load of Kafka cluster is relatively uniform,thus improving the throughput of Kafka message system and reducing the delay.The specific research content and innovation points of this thesis are as follows:(1)An Automatic Configuration Tuning using Self-Attention and Generative Adversarial Network(ACT-SAGAN)algorithm is proposed to address the issue of low throughput and high latency when processing large amounts of data,as ordinary users do not have a deep understanding of Kafka configuration parameters and cannot configure them for specific application environments during use.Firstly,a self-attention mechanism is added into the generative adversarial network model to capture the correlation between hidden structures and configuration parameters from well configured combinations;Secondly,these hidden structures and associations are utilized to generate better configuration combinations,which can improve Kafka’s performance by deploying better configurations.This method reduces the number of system runs and does not require the establishment of a prediction model,resulting in a significant improvement in efficiency;Finally,the experimental results show that compared with Kafka in the default configuration,the throughput of this algorithm has increased by 78.60%,the average latency has decreased by 26.95%,and the maximum latency has decreased by 39.84%.(2)A load balancing performance improvement algorithm based on the Kafka performance comparison model is proposed to address the issue of Kafka’s inability to perceive the load status information in the cluster,resulting in data skewing when processing large amounts of data due to imbalanced load strategies,and high single point load resulting in low Kafka throughput and high latency.Firstly,the impact of CPU utilization,memory utilization,and disk utilization on Kafka’s performance are considered to handle new tasks based on the load status information of cluster servers;Secondly,a performance comparison model is established to predict the performance ranking of new tasks at each node;Finally,the server node with the highest ranking is selected to make the cluster load relatively uniform and suitable,which can improve Kafka’s performance.The experimental results show that compared to the default state of Kafka,using the servers selected by the algorithm in this paper for task distribution increases throughput by 57.32%,reduces average latency by 24.31%,and reduces maximum latency by 37.329%.

Keywords/Search Tags:

Kafka, Performance optimization, Generative Adversarial Networks (GANs), Random forest, Load balancing

PDF Full Text Request

Related items

1	The Optimization Research Of Spark Load Balancing And Random Forest Algorithm
2	Research On Generative Model Theory And Application
3	Research And Application Of Image Data Augmentation Technology Based On Generative Adversarial Networks
4	Few-Shot Image Generation Based On Generative Adversarial Networks
5	Research On Payment Fraud Prediction Based On Random Forest
6	Automatic Architecture Optimization Strategy Of Generative Adversarial Networks
7	Generative Learning Approach For Face Blur Image Recovery
8	Research On Load Balancing Of Netty Internet Of Things Server Cluster Based On Kafka
9	Research On Generative Adversarial Networks With Multiple Random Projections Based On Random Discard
10	Missing Data Imputation Using Boosting Generative Adversarial Nets