Font Size: a A A

Based On Information Flow SMS Seed Users Quickly Identify Under Big Data

Posted on:2017-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y X J XieFull Text:PDF
GTID:2309330482998092Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The identification of the information seed user is very important in the development of communication today. These seed users can play a big role at some critical moments. The processing of large data is applied to the identification of seed users, compared with the traditional method, it can get the result faster and better.This paper begins with a brief introduction to the background of telecom service and SMS seed user, big data definition, the various characteristics of large data and application value. Then, the paper introduces the method of density clustering and the tree network structure of recursive search method to identify the seed users, and then makes an empirical analysis. Due to a density clustering algorithm and a recursive search algorithm of tree network processing time is too long and the amount of data to standard data have been unable to deal with, so in the fourth chapter is on how to construct the improvement and promotion of the tree model. First, it analyzes the two factors that influence the user to become the seed user: the time preference and attribute characteristics, two main characteristics of information transmission: propagation time difference and direction. Therefore, in this paper proposed a method to quickly create tree network structure, and quickly find out the method of seed users. First, deal with data cleaning and data processing, the typical method of large data processing and cleaning: the method includes containing functional dependency and dependency based method, based on user-defined constraints, statistical learning based approach and the method based on causality. For the seed user, this paper chooses a more flexible method based on user defined constraints to the data, which is the first to the user according to the attribute characteristics, That is, in accordance with the industry attributes will be divided into different groups, through the analysis and calculation of the distribution of the relationship between the SMS and the flow of time to sort out the direction of the flow of information, that is, the direction of the. Thus gradually narrowing the search range, only to see the group at the source location, through the screening threshold users choose alternative seeds, and then verify the evaluation system of the seed user, designing a reasonable evaluation system for seed users, tree evaluation model. The final score will be calculated into the evaluation model of alternative seed users to determine seed users. Finally, through the comparison of the two models, the improved information flow model is superior to the tree network model.
Keywords/Search Tags:Big Data, Seed users, Information Flow, Information density, Tree Evaluation Model
PDF Full Text Request
Related items