| In many large-scale enterprises and Internet companies, a large number of log files are generated every day. The magnitude of data has already broken through the category of TB (Terabyte). How to resolve the problem of scattered log data, how to quickly and effectively deal with these data and how to combine these data with the business database so as to analyze users’access behavior and action preferences have already become serious problems toward each company which has started with big data. The settlement of these problems can make the company know clearly about itself on the development road and enable the accumulated data to create more commercial value for it.From the view of point of data analysts, this thesis studies on common data analysis techniques and projects used in enterprises. Besides, compared with all kinds of new techniques and tools continuously emerged in the big data environment, it analyzes the features of each technique and proposes a solution to optimize the analysis on user behavior data by integration of these techniques, eventually to realize a user behavior analytic system consisted of data collection, data processing, data calculation and data visualization.The studies and completed work in this thesis contain:(1) Analyze the relevant technical tools required in the system, including Flume and Kafka in the field of data collection, MySQLInfobright and Hive in the part of building data warehouse, Pig, Impala and Spark used in data calculation, Kibana used to generate visual graphs, Elasticsearch applied in storage, analysis and management of allochthonous data, etc. By systemizing their characteristics, determine the selection of technical tools used in the final whole system.(2) Establish a data warehouse. By classifying the formats and clearing the data on different types of log files, clean data files with the standard format are generated. Combine these files with data imported from the business database to establish a dataset of each product and form a data warehouse, which is regarded as the core data of the whole user behavior analytic system.(3) Design and develop an automatic data visual tool to automatically map the data generated from the data warehouse into the file system of Elasticsearch. Work out the problem of tedious operation in the routine data visualization resolutions.The user behavior analytical system proposed in this thesis has already been applied in one domestic community internet company. For several months of operation, the system reveals a stable condition and good effect. It can greatly improve the work efficiency of data analysts, make the analytic process toward user behaviors become easier, and let analysts pay more attention to the analytic logic. |