| There are a lot of spatio-temporal data in cities,such as the trajectory data of online vehicles,cell phone signaling data,road traffic data,etc.These data are often huge in volume,and they constitute a reflection of the real world in the virtual digital world,which contains a lot of wealth.However,the emergence of urban data not only brings opportunities,but also challenges to industry,one of which is the arithmetic problem in urban data analysis.The large volume and multimodal heterogeneity of urban big data have created a huge computational load.At the same time,Moore’s Law has reached its end of life,and it is difficult to maintain the momentum of continuous increase in single-core CPU performance.Therefore,it is of great practical application and theoretical research value to use distributed computing and heterogeneous computing technologies to accelerate urban big data analysis and conduct high-performance deployment of applications.The research goal of this paper is to design a distributed back-end computing engine based on open source tools to effectively schedule hardware computing resources including CPU,GPU,storage,and communication in the scenario of urban big data analysis in order to achieve stable,high-performance,and easy-to-use development,operation,and deployment of various urban big data mining tasks on a high-performance server cluster.The main work and innovation points of this paper are as follows.1.Requirement analysis of the tasks oriented by the computation engine.First,this paper divides the computational tasks to be solved by the computational engine into two categories: mining tasks for large-scale data and highly concurrent interactive computational tasks.This paper analyzes the characteristics of each of the two types of computing tasks,including their computing processes,possible computing power bottlenecks,corresponding acceleration schemes and computing architectures.2.The scheme selection and architecture design the computing engine are carried out.Based on the above analysis,through the investigation of various open source tools and distributed frameworks,this paper constructs a complete set of back-end computing engine based on open source technologies such as Docker technology,distributed resource management and container scheduling tool Kubernetes,distributed computing engine Spark,and distributed storage system Hadoop,whose functions include distributed large-scale data Its functions include distributed storage and management of large-scale data,real-time scheduling of heterogeneous computing resources,high-performance large-scale data analysis,interactive computing support,etc.3.Selected cases are tested on the computing engine in practice.In this paper,two practical applications for urban planning and management are selected as two types of task cases for distributed deployment and acceleration on the designed computing engine to verify the correctness and high performance of the architecture design: for the large-scale data mining task,the algorithm for residency detection on 150 million LBS data is tested,and the distributed deployment can achieve 360 times acceleration compared with that of a single machine.For a highly concurrent business deployment scenario,this paper selects a Res Net-based city foot traffic prediction application and implements a highly concurrent deployment of the application by dynamically scheduling GPUs via Kubernetes.Further,this paper uses a software simulation of the system load bearing test scheme to achieve the purpose of testing the cluster bearing capacity through program simulation of user behavior.The experimental results show that a cluster of nine servers can accommodate more than300 users working on the system simultaneously,and the paper also analyzes the impact of the number of GPU cards,the number of clusters,and the communication situation on the overall performance. |