Design Of Distributed Computing Engine Architecture For Urban Data Analysis

Posted on:2022-01-31

Degree:Master

Type:Thesis

Country:China

Candidate:Y B Tang

Full Text:PDF

GTID:2492306740996479

Subject:Signal and Information Processing

Abstract/Summary:

There are a lot of spatio-temporal data in cities,such as the trajectory data of online vehicles,cell phone signaling data,road traffic data,etc.These data are often huge in volume,and they constitute a reflection of the real world in the virtual digital world,which contains a lot of wealth.However,the emergence of urban data not only brings opportunities,but also challenges to industry,one of which is the arithmetic problem in urban data analysis.The large volume and multimodal heterogeneity of urban big data have created a huge computational load.At the same time,Moore’s Law has reached its end of life,and it is difficult to maintain the momentum of continuous increase in single-core CPU performance.Therefore,it is of great practical application and theoretical research value to use distributed computing and heterogeneous computing technologies to accelerate urban big data analysis and conduct high-performance deployment of applications.The research goal of this paper is to design a distributed back-end computing engine based on open source tools to effectively schedule hardware computing resources including CPU,GPU,storage,and communication in the scenario of urban big data analysis in order to achieve stable,high-performance,and easy-to-use development,operation,and deployment of various urban big data mining tasks on a high-performance server cluster.The main work and innovation points of this paper are as follows.1.Requirement analysis of the tasks oriented by the computation engine.First,this paper divides the computational tasks to be solved by the computational engine into two categories: mining tasks for large-scale data and highly concurrent interactive computational tasks.This paper analyzes the characteristics of each of the two types of computing tasks,including their computing processes,possible computing power bottlenecks,corresponding acceleration schemes and computing architectures.2.The scheme selection and architecture design the computing engine are carried out.Based on the above analysis,through the investigation of various open source tools and distributed frameworks,this paper constructs a complete set of back-end computing engine based on open source technologies such as Docker technology,distributed resource management and container scheduling tool Kubernetes,distributed computing engine Spark,and distributed storage system Hadoop,whose functions include distributed large-scale data Its functions include distributed storage and management of large-scale data,real-time scheduling of heterogeneous computing resources,high-performance large-scale data analysis,interactive computing support,etc.3.Selected cases are tested on the computing engine in practice.In this paper,two practical applications for urban planning and management are selected as two types of task cases for distributed deployment and acceleration on the designed computing engine to verify the correctness and high performance of the architecture design: for the large-scale data mining task,the algorithm for residency detection on 150 million LBS data is tested,and the distributed deployment can achieve 360 times acceleration compared with that of a single machine.For a highly concurrent business deployment scenario,this paper selects a Res Net-based city foot traffic prediction application and implements a highly concurrent deployment of the application by dynamically scheduling GPUs via Kubernetes.Further,this paper uses a software simulation of the system load bearing test scheme to achieve the purpose of testing the cluster bearing capacity through program simulation of user behavior.The experimental results show that a cluster of nine servers can accommodate more than300 users working on the system simultaneously,and the paper also analyzes the impact of the number of GPU cards,the number of clusters,and the communication situation on the overall performance.

Keywords/Search Tags:

Distributed computing, heterogeneous computing, urban planning, cluster scheduling

Related items

1	Research On Cluster Fault-Tolerant Techniques Of Distributed Stream Computing For Electrified Railway Monitoring Big Data
2	Researsh On Real-time Processing Of Railway Power Supply Monitoring Based On Distributed Kafka Queue And Stream Computing Cluster
3	Research On New Computing Model Of Electric Power System Based On Network Computing
4	Research On Model Design And Application Of Cube CNN In CPU-GPU Heterogeneous Computing Environment
5	Research On Performance Optimization Of UAV-Assisted Mobile Edge Computing System
6	Distributed-Computing-Based Reacvtive Power Optimization Of Power System
7	Research On Multiple DAGs Scheduling Of Heterogeneous Networked Embedded Systems For Automotive
8	Research On Crowd Evacuation Path Planning Method Based On Federated Computing And Edge Computing
9	Research On Task Offloading Strategy Of Vehicular Edge Computing In Urban Environment
10	Research On Distributed Computing Of The Optimization Problem Of Under-voltage Load Shedding Parameters