| With the improvement of informatization,the speed of data generation is faster than ever before.According to statistics,the number of Chinese netizens has reached1.032 billion.They have generated massive user-behavior-log in network activities.It is of great practical significance to use streaming technology to mine potential value from the-behavior-log in time,and it is also an important research direction of big data processing.This thesis implements a statistical system of user-behavior-log based on the Flink stream processing engine.Firstly,the statistical requirements based on resource information in userbehavior-logs are analyzed in detail.The user-behavior-log collection service is used to split and store the collected raw data to support batch processing and streaming;The Processing service is used to do multi-dimensional statistics,including historical cumulative statistics and window statistics.Secondly,the system functions are designed and realized,and two major problems in stream processing are solved.The two-layer log collection and dump system is implemented through Flume,and the logs are stored in HDFS and Kafka respectively,providing a stable data source supporting batch processing and streaming processing.With the help of Flinkās window mechanism,multi-dimensional historical cumulative statistics and window based statistics of resources in the user-behavior-log are realized,and the statistical results are stored in Redis and HBase respectively.For the storage pressure caused by high concurrent processing,the system in this thesis uses the Flink "window" function to aggregate input traffic to achieve a balance between the timeliness of data output and storage pressure.Through the hash storage type of Redis,the read-free accumulation operation is realized,which greatly optimizes the storage access pressure.The "event-time" based windowing and "processing time" based watermark mechanism is implemented,which generates water-mark steadily under multisource data flow and ensures the timeliness of data output.Through the design of Rowkey of the HBase,the minimum granularity window data is stored,and the delayed arrival data is not lost.Finally,test cases are designed for the system functions,and the test results are analyzed to ensure that the system functions meet the requirements. |