Font Size: a A A

Research On Commodity Demand Forecasting And Subwarehouse Based On Integrated Learning Model Under Feature Selection

Posted on:2019-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:J A DingFull Text:PDF
GTID:2429330548478877Subject:Logistics Engineering
Abstract/Summary:PDF Full Text Request
Big data-driven supply chain get great chances to help businesses significantly reduce operating costs,improve user experience,and play a crucial role in improving the efficiency of the entire e-commerce industry.Warehousing cost is one of the important links in the supply chain cost.With the help of big data and cloud computing technology,analyzing and mining the data information under the e-commerce transaction scenario can accurately forecast the future demand for commodities,thus helping businesses to reduce storage costs,while affecting the decision-making in the supply chain process.In warehousing analysis,due to the massive transaction data of merchants,and the dimension of data is often high,the value density is relatively low.We have high complexity in modeling learning,and the learning effect is poor.This paper makes a detailed analysis and Research on the existing problems of feature selection algorithm and integrated learning in processing large data sets,and puts forward some ideas and methods to reduce the complexity of the model,improve the accuracy of the prediction,and carry out a case study.The main contents and profit are as follows:1)A feature selection algorithm based on maximum correlation minimum redundancy(MCMR)is proposed,which takes into account the redundancy between features and the correlation between features and categories.Based on the idea of Pearson(Pearson)and angle cosine(COS)similarity algorithm,the redundancy between features is calculated,the linear correlation between the Pearson computing features and the nonlinear correlation between the COS computing features are calculated by exponential scaling and the amount redundancy of the total phase relation between the computing features.The correlation between feature and category is measured by information gain ratio between feature and class label.The algorithm also takes into account the linear and nonlinear relations between features,and the correlation between features and categories,effectively removing the maximum redundancy between features and solving the problem of sample imbalance.The introduction of weight ? and ? parameters for inverse feature selection can effectively avoid excessive traversal of the original feature set.2)An improved MCMR feature selection algorithm is proposed(rMCMR).When the MCMR algorithm removes the redundancy between features,the feature with smaller mean value is removed by comparing the average of the correlation coefficients between the highly correlated feature variable and other feature variables.The rMCMR algorithm considers the information gain rate between highly correlated features and categories at this step,removes smaller features,and optimizes the MCMR algorithm to re-compare the correlation between features and classes.At the same time reduce the algorithm time complexity and improve the excessive deletion of valuable information.3)Regression analysis on large data sets,because the relationship between the data is often nonlinear,can not find a curve fitting all points,but the classification problem,can find a good number of hyperplanes to distinguish all points.Therefore,this paper transforms the regression problem into a classification problem and selects the features.After selecting the feature subsets obtained afterwards,it performs regression prediction.Compared with direct regression,the accuracy can be significantly improved.4)In machine learning,as there are certain differences between different algorithm models,this paper uses the idea of integrated learning to integrate the multi algorithm model,and effectively combines the advantages of multi algorithm.The fusion model has a better effect than the single model.In this paper,UCI dataset experiments are used to compare the effect of MCMR and traditional feature selection algorithms such as CFS,WFS and Relief on feature reduction and classification performance.The advantage of MCMR is that it retains less features and has higher accuracy.Then,in the case analysis,the regression problem was converted into a classification problem for feature selection.The performances of the rMCMR algorithm and the MCMR algorithm were compared.In the highdimensional large data set,the rMCMR feature selection algorithm can retain the original data information to a greater degree while achieving the characteristics.The purpose of the reduction.Compared with the GBRT model constructed from the original data,the mean squared error(MSE)was reduced by 19.57% to 45.09%.In the forecasting of commodity sales and sub-warehouse planning,through the integration of multi-algorithm models,the total inventory cost is reduced by 0.6 to 1.97 million compared with the use of a single algorithm model,and the inventory cost of the offline forecast is ultimately 1.08 million,which verifies the effectiveness of the model fusion.
Keywords/Search Tags:rMCMR, Ensemble Learning, Random Forest, Gradient boost regression, Demand forecast
PDF Full Text Request
Related items