Research On The Application Of Machine Learning In Commodity Recommendation Based On Spark Environment

Posted on:2024-01-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Wang

Full Text:PDF

GTID:2568307073476584

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

With the emergence of major e-commerce platforms and the growing number of online shopping users,users have difficulty in choosing from a wide variety of commodities information,while each shopping platform tries various marketing methods to positively stimulate users’ willingness to buy in order to increase sales in the face of a large number of users.Based on the above needs,recommendation systems play an indispensable role in today’s society,which uses algorithms to select the information or commodities that the user wants from a huge amount of data for the user..In this thesis,we use the real user behavior data of Alibaba mobile e-commerce platform,and after processing,we get the user-commodities interaction behavior data from November 22,2014 to December 5,2014,with a time span of two weeks,which contains about 10 million sample data,and the user behavior data of the first week is used as the training set,and the data of the second week is used as the test set,the prediction goal is whether a user purchases an item on Friday of each week that the user has interacted with between the previous Saturday and this Thursday,with a positive category for purchase and a negative category for no purchase.Firstly,feature extraction is performed in the training set and test set through three perspectives: user,commodity and commodity category respectively.Since the percentage of samples with purchase behavior is very small and the imbalance of data will lead to the failure of model performance,the samples without purchase behavior are downsampled based on K-Mean clustering.The sampled data is used as the final modeling data in this thesis.Secondly,considering the large amount of data,this thesis starts from the Spark distributed environment,and first uses logistic regression(LR),gradient boosting decision tree(GBDT)and random forest(RF)separately for modeling and analysis,and the experimental results show that GBDT has the best effect in both training and testing sets.In order to further improve the prediction effect of the model,a combined GBDT-RF-LR model is constructed by combining the GBDT and RF in parallel and outputting the enhanced feature matrix,merging them with the training data to form a new data set,and then using logistic regression for training.Finally,comparing the results in both environments,the prediction accuracy of the model in Spark distributed environment is higher.Therefore,this thesis concludes that the best result is achieved by applying the combined model in Spark environment in commodity recommendation.

Keywords/Search Tags:

Recommendation System, Spark, Logistic Regression, Gradient Boosted Decision Tree, Deep Forest, Combination Model

PDF Full Text Request

Related items

1	Personalized News Recommendation Based On Gradient Boosting Decision Tree
2	The Decision Tree Algorithm Of Commodity Recommendation
3	Thermal Power Plant Energy Saving Analysis Based On Spark Big Data Platform
4	Research On Hybrid Recommendation Algorithm Based On Deep Learning
5	Forestnet: A Learning Architecture Combining Deep Networks And Decision Forest
6	Citation Recommendation Based On Gradient Boosted Regression Trees
7	Research On Code Plagiarism Detection Model Based On Random Forest And Gradient Boosting Decision Tree
8	Wearable Sensor Activity Recognition Based On Deep Forest Research
9	K-Means-Gradient Lifting Algorithm For Screening Advertisements On Network Platform
10	Research And Application Of Recommendation Technology Based On Logistic Regression