Font Size: a A A

Design And Implementation Of Data Splice System Based On Stream Computing

Posted on:2020-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:W C LiangFull Text:PDF
GTID:2518306104495424Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Baidu Fengchao is Baidu’s commercial advertising system,relying on hundreds of millions of searches per day,bringing a win-win situation for Baidu and promotional merchants,is a very important source of income.However,from the perspective of online operations and user feedback,Fengchao’s click log length affects upstream and downstream system usage and user experience,and even affects advertising revenue.The display log itself has some fields in the click log.To this end,a system is needed to extract the information in the display log in real time and stitch it into the click log to reduce the length of the original click log.In order to process a large amount of data in a timely manner,the system needs to ensure low latency,high stability,and expandability.The thesis introduces the project background of the streaming data splicing system,describes the research status of the streaming system at home and abroad,explains the related technical background and related technologies and frameworks used in the project,and analyzes the needs of the system,Including functional and non-functional requirements,followed by the overall design and sub-module design of the system,the implementation of several key modules in the system is described in detail,and then the system module functions and the diff between logs are tested.The streaming data splicing system introduced in this thesis is such a real-time advertising data processing engine.Through the recorded advertisement display logs and advertisement logs generated in real time from the search engine,a click log with display information is obtained.The downstream uses these data,To better statistics of the data,so that advertisers can better adjust their own delivery plans,to achieve a win-win situation among users,advertisers and Baidu.This streaming data splicing system is based on Baidu ’s self-developed streaming computing framework Task Manager,which combines Bigpipe,Table,HDFS and other related technical methods.In the application of Baidu Fengchao advertising system,the streaming data splicing system is used to shorten the click string.The length of the file improves the user experience.The input advertising log data stream processed by the system every day is up to tens of terabytes,and the generated "new" click log data stream reaches hundreds of G.
Keywords/Search Tags:Internet advertising, Massive data, Stream processing, Task Manager
PDF Full Text Request
Related items