| With the advancement of urban intelligence,the potential value of massive travel data needs to be explored and developed.Data has gradually become an important asset of enterprises and the core driving force for enabling business and promoting product innovation.In the past,due to the lack of overall strategic planning,there were data islands in the integration and use of data.Secondly,data problems such as different data statistical caliber and different data standards continue to emerge.The existence of these problems makes the back-end system unable to quickly respond to the front-end business and drive business innovation and change.The data middle platform will gather and process the data in a centralized way,build reusable common capabilities and shared data assets,and manage the data from the strategic level,so as to make the data truly become enterprise assets.This thesis designs a one-stop data middle platform architecture of public transport,uses its data middle platform capability to realize the application of data full link service based on Shen Zhen Tong card data,and finally tests the data generation service and data collection function service.By starting with the high-value data set of small scenes(i.e.Shen Zhen Tong card data),the data middle platform capability is applied to practice,which quickly verifies the data value and data middle platform capability.The main contents of this thesis are as follows:(1)Analyze the functional and performance requirements of public transport one-stop data middle platform.Based on the functional and performance requirements,the software selection mainly chose Apache big data ecological open source software,supplemented by some big data framework components commonly used in the industry,to design the overall architecture of public transport one-stop data middle platform.The overall design is based on four parts: data platform capability,data governance,data mining and analysis and data application.Through layered and clear division of responsibilities,the construction steps are clear and the functions are specific,which provides an architecture reference scheme for the industry.(2)Based on the card data of Shen Zhen Tong,a data full link service application is constructed.In the function module,five function modules are realized,which are data collection module,offline data analysis module,real-time data analysis module,data governance module and data service module;Around these five functional modules,a one-stop data full link service is established.In terms of technical selection,Flume is used in the data collection module to realize multi node data collection and summary;The offline data analysis module adopts HDFS + Spark + Hive to realize the hierarchical model construction and data analysis of data warehouse;The real-time data analysis module adopts Kafka + Spark Structured Streaming to realize the calculation of real-time data indicators;The data governance module uses Atlas to construct data lineage model;The data service module uses Ganglia to monitor flume,Kafka eagle to monitor Kafka and DBeaver to query Hive and My SQL offline / real-time view,Presto and Kylin to query ad hoc,and Data Eease to design and implement data visualization screen.In terms of architecture design,this application adopts lambda architecture to realize complexity isolation through stream batch separation processing.The overall framework design has certain fault tolerance and robustness,which can better show the data service ability of one-stop data middle platform.(3)Build a data middle platform test environment and test the relevant functions,plan the cluster hardware environment and the service allocation of each role of the server,build software services on it,make it have the general platform service ability,and test the data generation services and data collection function services. |