Font Size: a A A

Research On The Application Of Machine Learning In Commodity Recommendation Based On Spark Environment

Posted on:2024-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WangFull Text:PDF
GTID:2568307073476584Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the emergence of major e-commerce platforms and the growing number of online shopping users,users have difficulty in choosing from a wide variety of commodities information,while each shopping platform tries various marketing methods to positively stimulate users’ willingness to buy in order to increase sales in the face of a large number of users.Based on the above needs,recommendation systems play an indispensable role in today’s society,which uses algorithms to select the information or commodities that the user wants from a huge amount of data for the user..In this thesis,we use the real user behavior data of Alibaba mobile e-commerce platform,and after processing,we get the user-commodities interaction behavior data from November 22,2014 to December 5,2014,with a time span of two weeks,which contains about 10 million sample data,and the user behavior data of the first week is used as the training set,and the data of the second week is used as the test set,the prediction goal is whether a user purchases an item on Friday of each week that the user has interacted with between the previous Saturday and this Thursday,with a positive category for purchase and a negative category for no purchase.Firstly,feature extraction is performed in the training set and test set through three perspectives: user,commodity and commodity category respectively.Since the percentage of samples with purchase behavior is very small and the imbalance of data will lead to the failure of model performance,the samples without purchase behavior are downsampled based on K-Mean clustering.The sampled data is used as the final modeling data in this thesis.Secondly,considering the large amount of data,this thesis starts from the Spark distributed environment,and first uses logistic regression(LR),gradient boosting decision tree(GBDT)and random forest(RF)separately for modeling and analysis,and the experimental results show that GBDT has the best effect in both training and testing sets.In order to further improve the prediction effect of the model,a combined GBDT-RF-LR model is constructed by combining the GBDT and RF in parallel and outputting the enhanced feature matrix,merging them with the training data to form a new data set,and then using logistic regression for training.Finally,comparing the results in both environments,the prediction accuracy of the model in Spark distributed environment is higher.Therefore,this thesis concludes that the best result is achieved by applying the combined model in Spark environment in commodity recommendation.
Keywords/Search Tags:Recommendation System, Spark, Logistic Regression, Gradient Boosted Decision Tree, Deep Forest, Combination Model
PDF Full Text Request
Related items