Font Size: a A A

Research And Implementation Of Realtime Stream Computing Data Analysis System On Http Messages

Posted on:2017-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:D PanFull Text:PDF
GTID:2348330512964405Subject:Engineering
Abstract/Summary:PDF Full Text Request
The internet is developing very fast,the amount of data is growing very fast: urban data,medical data,web data,etc.These datas reflect the laws of people's daily life,they worth being analyzed.However,among all these datas,a large amount of data could not be saved for storage shortage.Http messages for example,when a user visits a web site,if all http messages are to be saved,the storage usage would grow in an amazing speed.Normally,after the user browsed the site,http datas would be discarded.In order to excavate the value of these http datas,this thesis uses stream calculation method to analyze http datas.Stream calculation is a data analyze method,which is real time.In http analysis stream calculation system,http datas would be analyzed immediately after they arrive,results would be saved and results only.The step to save original datas in storage could be avoided.The implied value of the http datas could be significantly taken use of.This thesis builds a stream calculation system that can analyze http datas in real time.Analysis results include users' pv/uv,stay time,visit deepth,etc.The results could be used to support higher level user behavior analysis,guide the site owners' s decision-making,guide the construction of the website,or verify the site's marketing results.Main contributions of this thesis are as follows:(1)This thesis designs a user-behavior analysis node cluster which can detemines user's information by filtering http messages in real time.Analysis nodes in the cluster work parallelly in assembly line.Results include the number of new users,users' access depth,users' retention time,users' coordinates,etc.(2)This thesis proposes an algorithm which can detemines user's visit times in real time.This algorithm uses set structure,filters http messages,identify user's once operation in many http messages,do calculation.(3)This thesis proposes an algorithm which can calculates users' access depth in real time.This algorithm uses a hash list to build a circular counting queue.Thus realizes a sliding window and do the calculation.(4)This thesis proposes an algorithm which can calculates users' stay time in real time.This algorithm uses two hash lists and some timers,builds the process to calculate the residence time of users.
Keywords/Search Tags:stream calculation, user-behavior, visit depth, residence time
PDF Full Text Request
Related items