Font Size: a A A

Analysis Of Internet Traffic And User Behavior Based On Co-clustering Algorithm

Posted on:2017-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:B Y LiuFull Text:PDF
GTID:2348330518995392Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
So far,the number of internet users in China has been close to 50%of the total population,which fully shows that the internet has been widely used in people's daily activities,and has greatly changed the way people live and work.Therefore,the research and analysis of internet traffic and network user behavior has become an important part of network research.At the same time,the continuous generation of mass data has brought challenges to the research of internet traffic and user behavior.The main work of this thesis is to study and analyze the large scale of network traffic and user behavior using data mining algorithms and tools.Specifically,this thesis first builds an object-level internet traffic analysis model based on user click identification,web object dependency graph.The graph model describes the dependency relationship among the web objects,which has the characteristics of high dimension,sparse,complex,but local dense.Therefore,in order to further explore the internal structure of the graph model,we design and implement a co-clustering algorithm,a non negative matrix factorization algorithm,which is used for the decomposition of large scale web object dependency graph,and extracts four typical web structure patterns.At last,this thesis makes a deep research and analysis on the characteristics and formation reasons of these four types of web pages.The main contributions of this thesis are as follows:firstly,the thesis innovatively presents an object-level internet traffic analysis model based on user click identification,web object dependency graph.The graph model describes the dependency relationship between primary objects and embedded objects in the network,which provides an effective mathematical model for the further research and mining of web structures.Secondly,based on the Spark distributed architecture,this thesis achieves and optimizes the parallel orthogonal non-negative matrix tri-factorization(ONMTF)algorithm.The proposed algorithm can achieve the dimension reduction of high dimensional and sparse nonnegative matrix,and the non-negative and approximate orthogonality of the decomposition results in a better explanation ability.SVD based matrix initialization method can get better local optimal solution for ONMTF algorithm.Thirdly,this thesis implements the parallel ONMTF algorithm to achieve the decomposition of the large scale web object dependency graph and extract four kinds of typical web structures,so as to dig out the web structure patterns in the network.
Keywords/Search Tags:internet traffic, co-clustering, non-negative matrix factorization, web structure pattern, spark
PDF Full Text Request
Related items