| As big data technologies and distributed related technologies becomes mature today,databases play an increasingly important role in business systems.How to improve database throughput and reduce operational response latency in the face of massive data has become an important issue for Internet companies.In this context,how to make full use of hardware resources and meet the performance requirements of online systems under the low-cost hardware resource allocation scheme has become a practical problem and research topic.Aiming at the database resource configuration optimization problem,this paper refers to the literature of distributed system and the application of common database,and selects distributed data storage system HBase as the research object.The random forest algorithm is used to model the relationship between hardware configuration parameters and HBase throughput,response latency.The improved particle swarm optimization algorithm is designed and optimized to optimize the mathematical model of resource allocation and capital cost.The optimization results are verified in the experimental environment,and the research goal of resource allocation scheme optimization is realized.The research content of this paper includes the following main steps:(1)Generate the experimental plan.Summarizing the hardware configuration parameters related to the performance of HBase by consulting related literature.According to the hardware resources of the existing experimental environment,the range of values of each characteristic parameter is determined.Finally,the experimental scheme is obtained by using the orthogonal experimental design method,and the experiment is conducted in the HBase cluster with reference to the experimental scheme.(2)Construct a predictive model.Experimental data for training is generated experimentally as inputs to the random forest algorithm.Build performance prediction model and verify it by using cross-validation.The error rate of all models is calculated according to the model verification method,as the evaluation index of model prediction effect.(3)Optimization of target issues.Mathematical modeling the relationship between resource allocation and capital cost and getting the target problem.The improved particle swarm optimization algorithm is designed to solve the target problem.The optimization of the target problem includes initializing the population,fitness calculation,extreme value replacement and particle state update.Verify the performance prediction model and the improved particle swarm optimization algorithm and analyze the result.Experiment with 2500 sets of samples with YCSB test tool cluster to obtain performance target values.Compare the error rate of performance prediction models built by random forest algorithm,Support Vector Regression,Artificial Neural Network and Decision Tree algorithms.Compare and verify the optimization results and convergence speed of the improved particle swarm optimization algorithm,genetic algorithm,standard particle swarm optimization algorithm and simulated annealing algorithm.The experiments show that the improved particle swarm optimization algorithm proposed in this paper reduces the original configuration cost by 25.6%.Compared with the optimization results of the standard particle swarm optimization algorithm,genetic algorithm and simulated annealing algorithm,the increase was 17.8%,8.8% and 6.9%,respectively.It is proved that the resource allocation optimization method proposed in this paper can obtain the optimal resource allocation scheme under the given throughput and response delay requirements. |