Font Size: a A A

Price Forecast Of IT Internet Online Courses

Posted on:2020-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:D L TangFull Text:PDF
GTID:2437330578976712Subject:Mathematical Statistics
Abstract/Summary:PDF Full Text Request
With the advent of the era of knowledge payment and the increasing recognition of online education by users,online courses have gradually evolved from traditional large-scale free public classes to paid courses.At this stage,both emerging and traditional websites,which focus on paid courses,are exploring profit-making models,and the variety of courses also plagues user choices.Studying online course pricing is important for increasing website revenue,storing users,and promoting platform development.This thesis uses the "IT Internet" class data on the Tencent classroom website to establish a statistical model and algorithm model of the course price to predict the course price range.The specific work is summarized as follows:1.Collect and process data.Use python to crawl the "IT Internet" class data on the Tencent classroom website,and perform data cleaning and data construction on the original data such as missing values,outliers,data transformation,normalization processing,derivative variable construction,and data imbalance processing.Make the acquired raw data into valid modeling data.2.Descriptive statistical analysis of the data.View the course categories and sales characteristics through bar graphs and empirical cumulative distribution function graphs;analyze the characteristics of the course's on-line and off-line time points;draw a curriculum introduction and an introduction to the word cloud map,explore the hot words of the course,and analyze the common concerns of the start-up institutions.3.Construct multiple logistic regression models for the price of online courses in IT Internet.By constructing multiple logistic regression models,the online course price range before and after the data imbalance problem is predicted,and the accuracy of the two prediction results is compared,and the model is evaluated.4.Construct an algorithmic model for the price of online courses in IT Internet.Using k-nearest neighbor,decision tree,support vector machine,random forest,XGBoost and other algorithm models,select the optimal parameters through grid search,use the accuracy,AUC value and F1 value to evaluate the effect of the model,and rank the importance of the variables.Analyze the influencing factors of the course price.The results show that the performance of the support vector machine and the random forest model is optimal on the data set,and the accuracy rates on the test set are 96.54% and 96.46%,respectively.This thesis obtains data through web crawler,reasonably processes and utilizesnumerical types and text type variables,and uses data mining methods in the study of online course price interval prediction.The support vector machine and random forest model are selected as the final price prediction model.The result provides a data reference for the user to evaluate the course price within a certain period of time,and also plays a feedback role in the optimization of the price range of the institution.
Keywords/Search Tags:Online Courses, Multinomial Logistic Regression, k-Nearest Neighbor, Decision Tree, Support Vector Machine, RandomForest, XGBoost algorith
PDF Full Text Request
Related items