Font Size: a A A

Research On Communication Behavior Of Telecom Users Based On Large Data Platform

Posted on:2018-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:D YuanFull Text:PDF
GTID:2359330518459435Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,many large enterprises,institutions and government departments obtain a variety of massive data,and Web log records the user's behavior and specific consumption,the construction of the site and the promotion of specific goods and Providing accurate service has a very important guiding significance.Today's log analysis needs are constantly changing and growing,and the process of log analysis also requires fastness and accuracy.However,how to handle massive logs,how to store massive amounts of data,and how to obtain useful information become a business and academic focus topic.Nowadays,people's life can not be separated from the network,and the behavior of life is basically through the visit to the site to achieve,so each enterprise in order to obtain the user's explicit needs and hidden needs,in-depth mining user's network behavior has become a focus on the study.Because the number of users visiting the site more and more,the resulting data is rapidly increasing,and how to process and store these massive amounts of data and get useful information becomes another challenge.According to people's research results,based on Hadoop related technology is to solve large data problems the most suitable methods and tools.The data in the original Web log file is inconsistent,incomplete,contains a lot of noise and dirty data,if not through data collection and pre-processing of data filtering and screening,data analysis will increase the workload,and may even cause wrong results.Therefore,we should carry out data analysis before the Web log data should be collected and pretreated.The daily telecom system will produce massive Web log data,single node data processing and traditional relational database can not meet their needs,how to store massive data research becomes the necessary research topic.Web log mining needs to be achieved through the algorithm,so the algorithm selection and design is also the key.In this paper,based on the telecommunications system web log research,its detail mainly include the following:1)Web log collection and pretreatmentWeb log collection and pretreatment is a prerequisite for Web log mining,providing accurate log files for log analysis,and the original Web logs contain noise and incomplete data,so you need to preprocess the log.However,with the rapid increase in the number of users,high-volume data to the Web preprocessing operation has brought great challenges.This paper proposes a Web log preprocessing mechanism based on MapReduce,which can solve the problem of data preprocessing efficiency and make better use of computer resources to reduce unnecessary waste of resources.2)Web log data storageAs the number of telecom users and Web sites visits is increasing,Web logs are also increasing daily,the traditional storage technology is expensive,running complex,scalability and other issues.The system uses HDFS and Hbase combination to achieve,make full use of the cluster of distributed storage advantages.3)Telecommunication system log mining and Improvement of clustering algorithmData mining is one of the core problems of large data.faced with high computational complexity and lack of computing power and other issues.In this paper,a distributed clustering algorithm based on Hadoop is proposed.The algorithm is based on the improved K-means algorithm to achieve,through the experimental results show that the algorithm has a good feasibility and accuracy.This paper validates the high efficiency of the Web log preprocessing process,the high scalability of HDFS and Hbase combination storage,and verifies the accuracy of the improved K-means clustering algorithm by comparing experiments with the large data simulation platform of telecom system log analysis.Through the log analysis of the telecommunication system,this paper digs out the behavior information of the users and helps the telecom operators to formulate reasonable packages and recommendation information by analyzing the behavior characteristics of the users visiting the website.
Keywords/Search Tags:Hadoop, Data mining, CFK-means algorithm, telecom system, K-means algorithm
PDF Full Text Request
Related items