| In the era of big data,more and more enterprises realize the importance of customer data,and customer relationship management(CRM)is also becoming more and more important.The ultimate goal of CRM is to maximize customer value.And different customers often have different values.Therefore,how to tap the most valuable customers is the key to achieve the maximum profit.Customer classification provides an important tool for this purpose.However,in the era of big data,CRM customer data not only has the imbalanced class distribution,but also presents some new characteristics:1)high-dimensional features,i.e.,enterprises can often collect customer data containing a large number of features on a regular basis;2)complex data structure characteristics,i.e.,the data structure of customer data may be linearly separable or linearly inseparable.These new characteristics and problems presented by CRM customer data have brought new challenges to customer classification!In this regard,based on the existing data characteristic-driven customer classification research paradigm,this paper proposes complex-valued group method of data handling(CGMDH)neural network customer classification models in class imbalanced environment.The main research achievements of this paper are as follows:(1)The theoretical framework construction of customer classification based on CGMDH neural networkIn CRM customer classification research,the characteristics of customer data often influence the performance of the classification model.In recent years,the problem-oriented research paradigm has attracted widespread attention,i.e.,the data characteristic-driven customer classification research paradigm.Most of these studies are modeling in the real number field.However,existing studies have shown that complex-valued classification models may achieve better classification performance for real-valued classification problems such as customer classification.Therefore,on the basis of this research paradigm,this paper combines the group method of data handling(GMDH)neural network with data transformation,complex-valued matrix calculation,resampling,feature clustering,ensemble learning and linearly separable discrimination technologies,and proposes a CGMDH neural network based customer classification research framework.Under this framework,aiming at customer classification problems with different data characteristics,corresponding CGMDH neural network based customer classification models are constructed.(2)Circular linear CGMDH neural network model for imbalanced customer classificationIn order to solve the shortcomings of the existing phase-encoded linear CGMDH(PE-LCGMDH)neural network in terms of complex-valued transformation and external criteria,this paper proposes a circular linear CGMDH(C-LCGMDH)neural network model for imbalanced customer classification.On the one hand,this model introduces a circular transformation in the complex-valued transformation to overcome the possible shortcomings of the phase-encoded transformation of the existing PE-LCGMDH neural network model.On the other hand,it provides sufficient theoretical research of the properties of the complex-valued symmetric regularity criterion(CSRC),which makes the work technically sound and complete,and proposes a logarithmic function based CSRC(Ln CSRC)to overcome the limitations of the CSRC.The C-LCGMDH neural network model consists of three stages:1)data preprocessing stage.It first uses random oversampling technology to balance the class distribution of the real-valued training set,and then uses a complex-valued transformation that introduces a circular transformation to transform the real-valued customer classification data into complex-valued data;2)train the linear CGMDH neural network model.It trains the CGMDH neural network model by using the complex-valued linear transfer function and the proposed Ln CSRC,and then obtains the optimal complexity model yopt;3)classify the test samples.The yoptis used to classify the samples in the complex-valued test set to obtain the complex-valued prediction results,and then the real-valued transformation is used to obtain the final real-valued classification results of the test samples.The experimental results on 23 real-valued classification data sets containing three customer classification problems show that:1)Compared with the RGMDH and PE-LCGMDH neural network models,the C-LCGMDH neural network model has the fastest convergence speed and the least number of selected features.2)The classification performance of the proposed model is significantly better than that of the other four complex-valued and three real-valued models.3)When dealing with data sets with fewer features,its time complexity is comparable to other models.4)This model is also a white box model that can give an interpretable expression.Furthermore,the empirical results on a customer classification data set show that the constructed C-LCGMDH neural network model can effectively solve the imbalanced CRM customer classification problems,and its customer classification performance is better than that of other seven models.In addition,we also found that this model is still inadequate in solving customer classification problems with high-dimensional features or complex data structure characteristics.(3)Fuzzy bi-objective clustering based CGMDH neural network selective ensemble model for imbalanced customer classification with high-dimensional featuresIn class imbalanced environment,in order to avoid the use of feature selection and other dimensionality reduction methods for CRM customer classification problems with high-dimensional features tend to lose a large amount of useful information,this paper is inspired by the"feature clustering-ensemble"two-stage research paradigm and proposes a fuzzy bi-objective clustering based CGMDH neural network selective ensemble(FBC-CSE)model.The FBC-CSE model first introduces the cosine distance and the modeling idea of min-sum K-clustering,proposes a fuzzy bi-objective clustering(FBC)algorithm and an improved clustering evaluation index to perform feature clustering,and uses the NSGA-II to solve FBC for K features clusters;Next,K training data subsets are obtained by mapping calculation,and after using random oversampling technique to balance their class distribution,the base classifier C-LCGMDH neural network is trained respectively,then their classification performance on the validation set can be calculated;Finally,according to the classification performance on the validation set,a selective weighted voting ensemble strategy is used to classify the test samples.The empirical results on the two real-world high-dimensional customer classification data sets show that:1)in the case of the same base classifier and ensemble strategy,the constructed FBC algorithm is superior to other eight commonly used feature selection algorithms and three advanced feature clustering algorithms,which indicates that it is effective and can make up for the deficiency of the C-LCGMDH neural network model which is hard to effectively solve the high-dimensional customer classification problems.2)The selective weighted voting ensmeble strategy is superior to the other three commonly used ensmeble strategies.3)The customer classification performance of the proposed FBC-CSE model is better than that of the other four existing"feature clustering-ensemble"two-stage classification models.(4)CGMDH based adaptive decision model for imbalanced customer classification with complex data structure characteristicsIn real-word customer classification problems,the data structures of the data used for modeling are usually unknown,and its modeling results are often highly uncertain.Therefore,the constructed customer classification model may not match the compelx data structure characteristics,which can lead to unsatisfactory customer classification performance.In class imbalanced environment,in order to solve the problem of how to construct the most appropriate CGMDH neural network model according to cmpelx data structure characteristics of customer classification data,this paper introduces linearly separable discriminant technology,and constructs a CGMDH neural network based adaptive decision(CGMDH-AD)model.On the basis of extending the C-LCGMDH neural network to a circular quadratic nonlinear CGMDH neural network,this model mainly contains the following four phases:Phase I is judging the complex data structure characteristic,where the linearly separable discrimination theorem is used to analyze the complex data structure characteristic of customer classification data;Phase II is data preprocessing,where the random oversampling technique is used to balance the training set and the real-valued customer classification data are mapped to the complex field;Phase III is adaptively selecting and training the CGMDH neural network model,where according to the judgment result of Phase I,the most appropriate CGMDH neural network model is adaptively selected and trained on the complex-valued training set;Phase IV is classifying the test set.In order to analyze the effectiveness and rationality of the CGMDH-AD model,the experimental results on 16 real-valued classification data sets show that the classification performance of this model is significantly better than that of the circular linear and quadratic nonlinear CGMDH neural network models.Furthermore,to verify the customer classification performance of this model,the empirical analysis on 14 real-valued customer classification data sets finds that its customer classification performance is significantly better than the other seven classification models,and comparable to the circular quadratic nonlinear CGMDH neural network model.In addition,the CGMDH-AD model can also effectively identify the key features that could differentiate high-value and low-value customers,and its explanatory power is strong. |