Font Size: a A A

Design And Implementation Of Real-Time Data Warehouse Based On Big Data

Posted on:2022-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:B W LeiFull Text:PDF
GTID:2518306350989499Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the big data era,traditional data warehouses can no longer meet the current real-time business needs.In order to better process data,big data tools came into being.With the development of big data-related technologies and the improvement of business real-time requirements,real-time processing is becoming more and more important.Ordinary real-time calculation only considers timeliness,and the intermediate results of data calculation are not stored.When facing a large amount of real-time demand,data reusability is poor.Therefore,it is very important to build a real-time data warehouse.This article analyzes the development of data warehouse architecture,and conducts detailed research and discussion on offline big data architecture,Lambda architecture and Kappa architecture.Aiming at the current three mainstream streaming frameworks Storm,Spark Streaming and Flink,comparing it from many aspects,we find the superiority of Flink in the direction of stream batch processing and real-time processing.This article is mainly divided into four parts: data collection,data warehouse layering,large-screen display and risk monitoring.The following are the main contents of each part:1.Data acquisition module.The log data is obtained from the front-end embedded point,and the business data is directly imported into the My SQL database,and then captured by Maxwell.2.Data warehouse hierarchical module.The data warehouse is divided into ODS,DWD,DWM,DWS and ADS layers to make real-time calculations more reusable and have a clearer structure.Among them,the DWD layer uses the data object as the unit to perform data distribution.The DWM layer mainly carries out the construction of access UV calculation,out-of-detail calculation,order width table,and payment width table.The DWS layer aggregates a wide table of visitor themes,product themes,regional themes and keyword themes.3.Large screen display module.In this thesis,the result data in Click House is connected to the Sugar tool through the Web Service data interface for visualized large-screen display.Publish data interfaces by implementing corresponding methods in the Controller layer,Service layer,and Mapper layer.According to business needs,the Sugar large screen display is divided into eight modules: total transaction amount,province and city heat map query,time-sharing traffic,brand Top N,category distribution,hot words,traffic table,and popular products.4.Risk monitoring module.Use Flink CEP to consume Kafka-side data in real time,carry out risk monitoring and reminders,and realize blacklist filtering,malicious login and real-time order monitoring.The real-time data warehouse based on Flink implemented in this article can connect to multiple data sources,realize real-time display of result data,real-time risk monitoring of user behavior,and can seamlessly connect with offline data warehouses,truly realize the integration of stream and batch processing,and can be seamless Meet the real-time data processing needs of enterprises.
Keywords/Search Tags:real-time data warehouse, streaming, Flink, risk monitoring
PDF Full Text Request
Related items