Design And Implementation Of Front-end Buried Point Data Analysis System Based On Big Data

Posted on:2024-08-27

Degree:Master

Type:Thesis

Country:China

Candidate:L Li

Full Text:PDF

GTID:2568307070450564

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the information industry,the Internet applications that users can use are endless.How companies can make their applications stand out from a variety of applications,not only to ensure that the creativity is high enough,but also to ensure the user experience.Using front-end buried point technology,we can collect relevant data on user behavior in a real network environment,deeply restore user usage scenarios,and provide strong support for product improvement and optimization.Buried point data has the characteristics of large amount of data and miscellaneous data.Therefore,when it is analyzed at the back end,it may affect the performance of the back end,thus affecting user operation and bringing bad experience to users.Similarly,if the back-end has the need to merge and analyze multiple modules,it is too cumbersome and wasteful to store the buried point data in their respective databases for merge analysis.In order to analyze a large number of messy buried point data,reduce the development cost of the analysis system and improve the reuse rate,this paper constructs a big data analysis system,which integrates the buried point data into the big data platform for analysis.It can solve the problem of comparative analysis of multiapplication buried point data practically and efficiently.At present,the domestic data analysis service provider named Shence Data has realized the analysis of buried point data stored in the big data platform.The data analysis service provider realizes the collection of buried point data by providing SDK interface and HTTP connection address.However,the HTTP connection header contains state code,response code and other information unrelated to the buried point business.In this paper,the collection method is improved,and the buried point data is pushed into the Kafka message queue in the important components of big data by each system,which can not only buffer the data pressure,but also dynamically expand the data type,and the user can dynamically adjust the collected parameters.Then,all the buried point data are stored in HBase,an important component of big data,and the data in HBase is used as a data source for unified analysis,which is processed independently of the back-end business module.This paper also uses the improved and optimized random forest algorithm to apply the harmonic mean to the voting mechanism of the random forest algorithm to predict the user churn degree of the collected data and display it in the display system.Finally,this thesis implements a storage and analysis independent system to meet the multiple reuse function.The test results show that the system meets the requirements,realizes various functions including buried point data acquisition and analysis,and meets the expected goals.At the same time,the performance test of the big data platform is carried out.When millions of requests are requested,the delay is low(less than 1ms),which meets the performance requirements.The precision,accuracy,recall,F1-score and AUC of the improved random forest algorithm were evaluated by two sets of data sets published on Kaggle.The improved random forest algorithm effectively improved these five indicators.

Keywords/Search Tags:

Buried point data, Kafka, Big data, HBase, Random Forest

PDF Full Text Request

Related items

1	Research On Performance Optimization Methods For Kafka Message Systems
2	The Design And Implementation Of Real-time Processing System For Device Log Stream Data Based On Storm
3	Research And Implementation Of DSP Data Warehouse Optimization Based On Spark
4	Research On Strategy Of Imputing Missing Data Based On Random Forest
5	Research And Implementation Of Performance Tuning Method Of A Distributed Storage System Named Hbase
6	Big Data Flow Processing Analtsis System Based On Kafka
7	Research For Imbalanced Big Data Classification Algorithm On Random Forest
8	Research And Development Of Key Technology Of Data Bus System Based On Kafka
9	Research And Application Of High Dimensional Imbalanced Data Classification Based On Random Forest
10	Research On Feature Selection And Classification Method Based On Random Forest For Medical Datasets