Research On Automatic Deployment Of Distributed Computing Resources For Big Data Systems

Posted on:2018-02-23

Degree:Master

Type:Thesis

Country:China

Candidate:H Li

Full Text:PDF

GTID:2348330515951715

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The 21st century we enter a new period--the era of big data. Data have been regarded as the source of wealth. Big data promote the development of cloud computing rapidly. Cloud computing has become a new business model. It has attracted more and more attention in the field of industry, academia and society. "Cloud" provides a new dimension with fixed or mobile for global users, providing computing resources in the form of infrastructure as a service(IaaS), platform as a Service (PaaS) and software as a service (SaaS). These resources are based on the environment of the Internet. People can choose to pay by usage or allocate resources.Due to the use of resources is uncertain. For the cluster and applications built on the cloud platform, it is a double-edged sword to determine the size of its resource capacity, which could lead to inadequate or excessive supply. For cloud resource tenants,excessive demand for resources will lead to waste of resources and cost too much; and for Cloud Service Provider, excessive supply of resources to tenants will result in low overall resource utilization. Therefore, resource scheduling problems in cloud computing are considered to be as difficult as non-deterministic polynomial (NP)optimization problems.In order to improve the utilization of resources, the research work is carried out from two levels. They are the cluster inside and the cluster scale in this paper.(1)I made a scrutiny into the Hadoop principle architecture, the MapReduce computing framework and the HDFS file system. Then, I studied the three scheduling algorithms which supported by the Hadoop system. In this process, I found the shortage of the existing algorithm. The self-learning method was used to scheduling resource,and the feature-weighted Naive Bayesian scheduling algorithm was proposed. The experimental results shown that the use of feature-weighted Naive Bayesian scheduling algorithm is less time and high resource utilization than using Hadoop’s default scheduling algorithm when running WordCount jobs.(2) Hadoop cluster overall resource in short supply and over-supply will lead to resource saturation and a waste of resources. Combined with the cloud platform OpenStack and big-data tools Hadoop, a system which can dynamically adjust the scale of the cluster has been designed. The whole system is composed of three modules:monitoring, scheduling and virtual machine management. In the scheduling part, the timer adjustment only handles the jobs which have the feature of periodic and stability.Although the threshold adjustment can handle almost all cases, it causes the delay of resources supply. In this paper, time series workload forecasting algorithm based on SVM is proposed. The accuracy of forecasting results has a crucial influence on decision-making. Therefore, the SVM algorithm and the ARMA algorithm are used to predict the time series of the workload. The experimental results show that the prediction results of the SVM model are more accurate than the ARMA model under the model of growth and irregularity.

Keywords/Search Tags:

Cloud computing, Resource scheduling, Naive Bayesian classification, SVM, Cluster size adjustment

PDF Full Text Request

Related items

1	Research Of Cloud Computing Scheduling Algorithm Based On Bayesian Model
2	Design And Implementation Of Scheduling Strategy For Distributed Resource Scheduling Platform
3	Research And Implementation Of Job Scheduling Algorithm In Cloud Computing
4	Research On Classification Algorithm Used HADOOP
5	Research On Scheduling Technology Of Cloud Service Resource Based On Game Model
6	Research On Cloud Service Deployment And Management Mechanisms In Cloud Computing
7	On The Metric Of Cloud Resources And The Corresponding Scheduling
8	Research On Construction Of Elastic Cluster Based On Docker And Resource Pre-scheduling Strategy
9	A Study Of Trace-driven Resource Scheduling Simulator In Large-scale Cluster
10	Research On Virtual Resource Scheduling Strategy In Cloud Computing Environment