| The rapid development of the Internet has brought about an increasing number of Internet users With huge user volume and the development of network technology,China has entered the era of big data.Providers of network services need to record information about effective user behavior,network operation,security,etc.in the network behavior of such a large number of users and users,which will generate a large amount of logs.In this context,the analysis system for massive network logs provides an effective ability to collect and process analysis logs.This paper proposes an architecture system based on Flume and Kafka for collecting and caching logs,HBase and Storm for log landing and streaming processing,and k-means clustering algorithm for realizing the collection and analysis of massive logs.The main focus of the analysis is user behavior out of logs to provide reliable user analysis data to network service providers.This paper first introduces the domestic and foreign research status and research results of the log collection system,analyzes the characteristics of the log under the big data,and combines the characteristics of the massive log to research and implement the log collection and analysis system.Secondly,the clustering algorithm is researched and improved in the log processing part.In the Log Collection Section,since the background is a massive log under big data,this paper combines this feature and based on the distributed architecture,multi-node Flume is used for log collection.In order to improve the reliability of collecting logs and prevent the loss of logs,the system uses Flume as the producer of log messages,and Kafka as the consumer of Flume,thus ensuring the throughput and reliability of the system,can be effectively processed under the premise of massive logs.Kafka is used as a buffer for log data.Downstream is Storm for real-time result processing and HBase for storing log data after processing.In the log analysis section,the characteristics and ideas of the clustering algorithm are introduced firstly,and the optimization schemes and ideas of the existing k-means clustering algorithm are summarized.According to the actual application scenarios of this paper,a k-means algorithm optimization method combining adaptive selection of k values and attribute weights is proposed.It is more flexible and accurate clustering effect than the existing k-means algorithm.Finally,the architecture of the system is introduced from two aspects.The first aspect is the architecture of the big data log collection component.This paper analyses the characteristics of big data log and user behavior analysis,and then introduces the overall architecture design and configuration details of big data log collection component.The second aspect is to provide the front-end visual interface for users.A log acquisition and analysis system based on large data is designed and developed from the aspects of system development environment,functional modules,system flow and system testing,users can easily configure the function of log collection service,and visually display the clustering effect and download detailed data. |