Font Size: a A A

Research On E-commerce User Profile Based On Distributed Cluster

Posted on:2024-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:A X ShiFull Text:PDF
GTID:2558307136995339Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,network resources,and user-related data are exploding,and more and more Internet products are changing from the traditional buyout system to the subscription system,which also drives the development of user profiling research.It is very important for every Internet company to build an accurate user profiling system to gain a more accurate insight into users’ needs and thus improve their products.However,traditional machine learning algorithms are not applicable to the calculation of massive data,and some current research on classification and prediction can hardly meet the demand.In this thesis,we combine data warehouses to sort out massive user data in a hierarchical way and make corresponding improvements from different perspectives for the study of value classification and churn prediction for users.The main research contents are as follows:(1)Research on user value classification method based on improved RFM model: Since the traditional RFM model has fewer analysis indicators,and there is a certain probability problem between the last consumption time and the strong positive correlation between consumption frequency and consumption amount,etc.,this paper proposes an improved RFM model of e-commerce user value classification method.Firstly,the indexes for user analysis are added,and all interactions of users in e-commerce platforms are integrated into the attribute characteristics of user activeness.Then the values of each indicator are subjected to a binning operation,discrete the continuous indicator values,and reduce the influence of the deviation of the taken values on the distance calculation by data standardization.Then the judgment matrix is constructed using hierarchical analysis to calculate the weight values of each index.Finally,the K-Means|| algorithm,which is suitable for parallel computation of massive data,is used to cluster the users,and the K value with the best effect of the clustering model is selected according to the elbow rule and the contour coefficient.The comparison experimental results show that this RFME model has a higher contour coefficient and a more objective value classification system compared with the traditional RFM model and some improved RFM models,and has a more accurate and faster user classification effect.(2)Research on user churn prediction method based on Tent-SSA-Cat Boost algorithm: Since the traditional machine learning algorithm has problems such as difficulty in coping with large-scale training samples,sensitivity to the expression form of input data,and difficulty in handling unbalanced data,this paper proposes a user churn prediction model based on the improved Tent-SSA-Cat Boost algorithm.Firstly,the improved Borderline-SMOTE algorithm is used to deal with the problem of data imbalance in the dataset and reduce the impact of the unbalanced sample distribution.Then the Tent chaos mapping has a uniform distribution function.Considering that the sparrow search algorithm generates the initial population randomly,which can easily lead to the problem of uneven distribution and falling into local optimum,the Tent chaos mapping is used to generate the initial population to improve the population diversity and enhance the global search ability of the algorithm.Finally,the improved sparrow search algorithm is used to optimize the hyperparameters of Cat Boost to ensure the convergence speed and find the optimal combination of hyperparameters quickly.After testing on two user attrition datasets,the results show that the Tent-SSA-Cat Boost algorithm has higher prediction accuracy than the single XGBoost,Light GBM,and Cat Boost algorithms.(3)Finally,in order to meet the needs of the user profile management system,we combined the data warehouse of the Hadoop + Hive distributed cluster and the technical framework of Springboot+ Vue,and developed the system management,data management,message management,data display,and other modules of the user profile system to realize the visual display of various functions and understand the results of user value classification and churn prediction more intuitively.
Keywords/Search Tags:User profile, Value classification, RFM model, Clustering algorithm, Churn prediction, Sparrow search algorithm
PDF Full Text Request
Related items