Font Size: a A A

Design And Implementation Of Baidu Feed User Behavior Data Warehouse

Posted on:2019-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q L ChengFull Text:PDF
GTID:2428330566497296Subject:Software engineering
Abstract/Summary:
In the age of information intelligence,using large data and using data analysis model and analysis algorithm to provide users with personalized services to increase users' dependence on products has become a mainstream trend.Baidu has a huge amount of users.Analyzing user interests and hobbies through user behavior logs to provide users with the most suitable services has become the purpose of Baidu.Baidu's information flow is based on billions of user data,It understa nds user behavior and accurately get user needs with crowd attributes,search intentions,behavioral data,and hobbies.Feed flow is the key business of Baidu in 2017.According to the growth momentum of information stream advertising in the entire advertising market,Baidu Feed stream advertising will become a new growth point.Feed stream services bring huge economic benefits to Baidu.At the same time,with the flow of feeds becoming Baidu's key business,in order to further improve the feed flow business and better understand the needs of users,various analysis services on the feed flow are gradually established.For the important needs of feed stream data management and analysis,his paper implements the analysis,design and implementation the data war ehouse of Feed user behavior based on Hadoop platform.And finally output detailed level data and data subject level data about Feed user behavior data.First of all,this paper analyzes the product form of Feed and the log source that the data warehouse n eeds to establish,and also analyzes the function requirement and data dimension of the data warehouse to be established in this subject.Secondly,this paper has detailed design from the architecture level and data model level of the data warehouse.When designing the data warehouse architecture level,we closely follow the business characteristics and design the appropriate data warehouse architecture and ETL development process according to the business characteristics.Then,we focused on the core of this topic-the establishment of the Feed user behavior data warehouse.including the establishment of Feed user behavior basic data warehouse,ETL data analysis and processing,the automatic production of the basic data of the detail layer,and the construction of the topic layer and the presentation of the data report.Finally,test and verify the data quality of the data warehouse and the function and performance of the data warehouse,and test the implemented data warehouse management data from various dimensions to ensure that the data provided by the data warehouse is correctly available.In this paper,we set up a unified,standardized and available Feed user behavior basic data warehouse,which combines the construction process and the actual business characteristics of the classic data warehouse.At present,the data warehouse has been tested and used online,providing a stable and reliable basic data about Feed user behavior for the team of spam.The team uses data analysis and other techniques to filter the cheating data in the user log data,so in the business aspects of advertising traffic calculation,the actual user behavior log can be counted,which has brought great contribution to the company,embodies the practicality and important value of the data warehouse.
Keywords/Search Tags:big data, data warehouse, ETL, Feed information flow
Related items