Design And Implementation Of Content Identify Module In Data-Management Platform

Posted on:2016-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:C L Hu

Full Text:PDF

GTID:2308330470955568

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of intelligent devices, more and more people joined the Internet through a variety of intelligent devices. If the massive user data that produced during this period, can be used effectively, will produce tremendous value. On the other hand, with the reduction of distributed clustersâ€™ cost and the mature of distributed algorithms, to analyse large quantities of data is becoming more and more convenient and efficient. The project which is described in this article is an application in the advertising industry that using distributed clusters and distributed algorithms.By analysing huge amounts of usersâ€™ data that generated when people surfing the Internet, the project that this thesis involved in aims to find the most valuable crowd who may buy the product when they see the advertisement so as to reform the way of advertising in the advertising industry, which turns blind advertising into an accurate way. The authorâ€™s work focuses on the development of crawler project, the formulation of the architecture and rules of content identification system, the development and testing of content identification system and the system log analysis for advertising.The project is developed by the Java and Python program language and running on hadoop clusters. According to the identification (domain name, products, applications, search keywords, Cookield, terminal type, the User Agent, Token, etc.) rules which are collected by the crawler (Scrapy) then stored in relational database (MySQL), the project will analyse, summarize, then make model using the massive usersâ€™ online data that stored in NoSQL database (Hive), then send the results to the high performance key-value database which named Redis so that query related peopleâ€™s data when advertising. The project is aimed at providing decision-making basis for ads bidding process so as to achieve the target of showing the ad to the crowd who most likely to occur buying behavior, in other words, improving the rate of Return On Investment.

Keywords/Search Tags:

Distributed clusters, Distributed algorithm, NoSQL Database, Crawler, Content Identify, Advertising

PDF Full Text Request

Related items

1	Design And Implementation Of Distributed NoSql Database And Used In Jilin Social Insurance System
2	Implementation Of NoSQL Database Distributed Cache System
3	Design And Implementation Of Top-Scholar Talents Database System Based On Distributed Crawler
4	The Support For Distributed Database Transaction
5	Research On NoSQL Database Technology And Application
6	Improved Algorithm And Performance Optimization Of Distributed Storage System Based On NoSQL
7	Research And Implementation Of Distributed Web Crawler
8	Distributed Storing And Parallel Querying Of PDM Document Based On NoSQL
9	Research And Implementation Of Cloud Cache Service Construction Technology Based On NoSQL Database
10	Research And Application Of Distributed Crawler Technology Based On Ant Colony Algorithm