Design And Implementation Of Event Tracking System Based On Unified Streaming And Batch Processing

Posted on:2024-01-25

Degree:Master

Type:Thesis

Country:China

Candidate:P R Dong

Full Text:PDF

GTID:2568306941989749

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Event is the term used in data collection to refer to the implementation of capture,processing,and sending user actions at the"point of operation" where data needs to be collected.It is used to track the use of applications and can provide operating data support such as containing page access and number of page visits."When a data-driven enterprise makes business decisions,it often needs a high-quality event tracking system to complete the collection,processing,storage and calculation of event.So,the thesis topic has practical application significance.The existing event tracking system evolved from the offline data warehouse,using the Lambda architecture to meet both offline and realtime demand.This is a general big data processing framework with the advantage of high stability.However,it has high development and operation costs,and is prone to inconsistent offline and real-time statistical caliber issues.In order to solve the above disadvantages,this paper designs and implements a system based on unified streaming and batch processing,which can clean,process,store and calculate the event data.Unified streaming and batch processing refers to the use of a unified method to complete streaming computing and batch computing of data to ensure the consistency of processing and results.The system uses unified streaming and batch processing to build an event warehouse,which supports switching the processing mode of computing tasks under the same cluster,reducing the cost of development and operation.The system designs a flexible and complete event format,which provides a solid foundation for subsequent data analysis.In the reporting process,the system reduces the amount of event data that needs to be transmitted through batch reporting,extraction of common fields and compression.In addition to integrating and scaling advanced big data storage and processing systems,the paper also meets non-functional requirements by improving system workflow and system architecture,such as using publish-subscribe mode between different modules to improve reliability.According to the software engineering method,the paper conducted requirement analysis,overall architecture design,module design,and code implementation for the tracking system.The system uses Apache Flink as the unified streaming and batch processing engine,uses Kakfa to temporarily store event as a downstream data source,and uses the table format Apache Hudi to support both streaming and batch reading and writing of data.The system has passed functional and non-functional tests,can cover the whole link of event data collection-processing-storagecomputing,and provide support for enterprises to carry out product improvement,operation optimization,marketing analysis and business decision-making,so as to better to help business grow.

Keywords/Search Tags:

unified streaming and batch processing, event tracking, data warehouse, Flink

PDF Full Text Request

Related items

1	Research And Implementation Of Batch-Stream Integrated Big Data Platform Based On Kappa Architecture
2	Implementation And Application Of E-Commerce Real-Time Data Warehouse System Based On Flink
3	Research And Implementation Of Massive Data Processing And Storage Architecture Of Beidou Monitoring Receiver
4	Design And Implementation Of E-commerce Real-time Data Warehouse System Based On Flink
5	Flink-based Live Streaming Video Data Monitoring System
6	Design And Implementation Of Real-time Data Warehouse Visualization System Based On Flink
7	Design And Implementation Of Real-Time Data Warehouse Based On Big Data
8	Research On Real-time Data Warehouse Optimization Technology Based On Flink
9	Design And Implementation Of Real-time Data Warehouse For E-commerce Platform Based On Flink Framework
10	Design And Implementation Of Real-time Risk Control System For Securities Trading Based On Flink