Font Size: a A A

Mobile Internet Websites And Server Traffic Analysis Based On Hadoop

Posted on:2015-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:T T LiFull Text:PDF
GTID:2298330467963418Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years, the Internet in China is developing rapidly, especially the mobile Internet. Internet subscribers in the proportion of people using the mobile Internet increased to78.5%. With the development of mobile Internet technology, mobile Internet traffic data has increased dramatically, the demand of data storage capacity and transfer speeds have become more and more high, the traditional database system has been unable to meet the massive traffic data analysis needs. Google’s MapReduce parallel programming model for distributed file systems analyze massive data has become an important method to ensure effective operation and high-speed data processing. The open source project of Apache, Hadoop, uses a distributed storage system to improve the literacy rate and expand the storage capacity; it has been widely recognized in industry and academia. Hadoop has become an important tool of massive data analysis.This thesis first introduces the mobile network’s websites analysis and research status, then introduces the definition and method of data mining, and briefly characterized the era of big data, then present Hadoop system, including distributed file system (HDFS) and MapReduce computation model, introduced the website and server traffic data mining platform that combines Hadoop system and its applications in data mining.This thesis includes three aspects, the server recognizes and traffic analysis, website visit analysis, flow characteristics of SP (service provider) analysis. Server IP identification is through direct extraction method, similar to the Uri method, refer method and adjacent IP mask method, four ways to find the correspondence between the server IP and the domain name, analyze server traffic data. Website visited analysis means site statistics and analysis of data on the domain of intergraded traffic volume, visit count and visit subscribers, and also analysis the data of on the domain suffixes of intergraded traffic volume, visit count and visit subscribers. SP flow characteristics of the analysis refers to the SP integrated traffic volume, comprehensive visit count, the number of subscribers accessing the analysis reveals the mobile Internet traffic distribution of SP, discover the hidden relationship between SPs, and finally, according to the characteristics of its SP using k-means clustering algorithm for cluster analysis.
Keywords/Search Tags:mobile Internet, distributed computing, massivedata processing, traffic analysis
PDF Full Text Request
Related items