Font Size: a A A

Brand Digital Marketing Based On Machine Learning Classification Algorithm

Posted on:2019-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:J F FengFull Text:PDF
GTID:2429330551456184Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
After the realization of the Internet since 1995 in China,the Internet develops with a rapid growth until March 5,2015 when Premier Li Keqiang made the government work report of the State Council,in which the "Internet plus" was upgraded to the national strategic level.According to the forty-first statistical report released by Internet Information Center(CNNIC)in January 31,2018,the number of Internet users in China reached 772 million,and online retail sales reached a new high by 7 trillion and 180 billion yuan by the end of December 2017.The electricity supplier platform has also changed from the original Alibaba to the competition between cat and dog,and now to the diversified e-commerce platform.To be briefly,The electricity supplier platform has undergone tremendous changes.With the development of the times,the consumption of the consumers behavior has also undergone tremendous changes from the previous offline of consumption to the current online consumption.Based on the advanced storage technology,the user's behavior characteristics will be recorded,for example: the number of searches of the users on the platform,the number of views and the time of these behaviors.Mining the potential consumer psychological demands behind user behavior characteristics and promoting the volume of goods in a proper way has become a major breakthrough for every big electronic business platform and major businesses.This paper focuses on statistical analysis and modeling of user behavior and attributes of e-commerce platform.The core work is to predict users who may purchase the brand in the future according to the characteristics of consumers,which is essentially a classification prediction problem.First of all,we should get the user attributes of the corresponding brand and behavior within the last month through electronic business platform,the user attributes include its own gender,age and sensitivity to promotions and reviews,behavior within the last month including the number of views of the category or brand,purchase times,the number of search keywords and the last time of these behaviors.Selecting 22 independent variables in total,whether the user will buy the brand within the next 7 days as the dependent variable y.Secondly,preprocessing the data,and 19 independent variables and dependent variables are selected finally.After the descriptive statistics analysis of preprocessed data and correlation analysis between variables,use the traditional statistical method--chi-square to make a rough prediction,and it was found that the prediction effect was not good.Thirdly,selecting 70% data as training set and 30% data as test set randomly,then using the three popular machine learning classification algorithms: Logistic regression,RF and Xgboost algorithm to model the data sets respectively.In training set,Xgboost is superior to the other two algorithms in terms of accuracy and recall.On the test set,the accuracy rate of the three is almost the same.In terms of recall,RF is superior to Xgboost algorithm,and Xgboost algorithm is better than Logistic regression.Then the noise robustness test was carried out,and it was found that RF was the best robust to noise.Finally,the three models are used to predict the target customers in the next 7 days,and the selection of the RF and Xgboost algorithms is better through synthetical consideration.First,compared with the other two algorithms,the Xgboost algorithm is the shortest in terms of running time.Second,the accuracy rate on the test set of the three algorithms is not very different in terms of performance measurement.RF is superior to Xgboost and Xgboost is superior to Logistic regression in terms of recall.Third,the results of the RF and Xgboost algorithms are more prominent in terms of the prediction effect in the end.
Keywords/Search Tags:Performance measurement, Correlation analysis, Target user, Logistic regression, Random forest, Xgboost
PDF Full Text Request
Related items