Research And Implementation Of Log Collection And Analysis System Based On Big Data

Posted on:2020-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:K Yang

Full Text:PDF

GTID:2428330578466565

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The rapid development of the Internet has brought about an increasing number of Internet users With huge user volume and the development of network technology,China has entered the era of big data.Providers of network services need to record information about effective user behavior,network operation,security,etc.in the network behavior of such a large number of users and users,which will generate a large amount of logs.In this context,the analysis system for massive network logs provides an effective ability to collect and process analysis logs.This paper proposes an architecture system based on Flume and Kafka for collecting and caching logs,HBase and Storm for log landing and streaming processing,and k-means clustering algorithm for realizing the collection and analysis of massive logs.The main focus of the analysis is user behavior out of logs to provide reliable user analysis data to network service providers.This paper first introduces the domestic and foreign research status and research results of the log collection system,analyzes the characteristics of the log under the big data,and combines the characteristics of the massive log to research and implement the log collection and analysis system.Secondly,the clustering algorithm is researched and improved in the log processing part.In the Log Collection Section,since the background is a massive log under big data,this paper combines this feature and based on the distributed architecture,multi-node Flume is used for log collection.In order to improve the reliability of collecting logs and prevent the loss of logs,the system uses Flume as the producer of log messages,and Kafka as the consumer of Flume,thus ensuring the throughput and reliability of the system,can be effectively processed under the premise of massive logs.Kafka is used as a buffer for log data.Downstream is Storm for real-time result processing and HBase for storing log data after processing.In the log analysis section,the characteristics and ideas of the clustering algorithm are introduced firstly,and the optimization schemes and ideas of the existing k-means clustering algorithm are summarized.According to the actual application scenarios of this paper,a k-means algorithm optimization method combining adaptive selection of k values and attribute weights is proposed.It is more flexible and accurate clustering effect than the existing k-means algorithm.Finally,the architecture of the system is introduced from two aspects.The first aspect is the architecture of the big data log collection component.This paper analyses the characteristics of big data log and user behavior analysis,and then introduces the overall architecture design and configuration details of big data log collection component.The second aspect is to provide the front-end visual interface for users.A log acquisition and analysis system based on large data is designed and developed from the aspects of system development environment,functional modules,system flow and system testing,users can easily configure the function of log collection service,and visually display the clustering effect and download detailed data.

Keywords/Search Tags:

Big Data, Distributed Systems, Log Collection, Clustering Algorithm, K-means Algorithm

PDF Full Text Request

Related items

1	Research And Implementation Of Text Clustering Algorithm Based On Memory Calculation
2	Research And Improvement Of K - Means Clustering Algorithm
3	The Improvement On The Fuzzy C-means Algorithm
4	Fuzzy C-means And K-means Clustering Algorithm And Its Parallel
5	Research And Distributed Implementation Of Cluster Algorithm Combined AFSA With K-means
6	Study Of Chinese Text Clustering On Improved K-means Algorithm
7	Ant Clustering Algorithm With K-harmonic Means Clustering
8	FCM Clustering And Research Of Its Increment Algorithm
9	Research And Application Of Clustering Algorithm Based On Dynamic Coupled Tissue-like P Systems
10	Improvements Of K-means Clustering Algorithm