| Enterprises are basic units of the national economy and the main participants of market economic activities.Their financial situations affect their own development,national politics,economy,and people’s livelihood,and even play a significant role in global asset allocation and the stability of the world economy.In addition,accurately predicting the financial situation of enterprises can not only provide management suggestions for enterprises themselves but also provide decision support for stakeholders.However,when predicting the financial situation,there are usually some prominent and typical data traits problems in the dataset,such as data missing,data class imbalance,data noise,and so on,which will have a significant negative impact on the use and performance of the prediction model.Therefore,this paper constructs three financial distress prediction models for different data trait problems based on the ideas and principles of the case-based reasoning(CBR)method by focusing on the enterprise financial distress prediction field.The main research contents and innovations of this paper are shown as follows.(1)CBR-driven financial distress prediction with data missingFor the problem of financial data missing,this work proposes a CBRdriven ensemble learning paradigm for financial distress prediction with missing data.The proposed learning paradigm involves three main stages,CBR-driven missing data imputation,CBR-driven single classifiers prediction,and CBR-driven ensemble result output.In the first stage,the CBR-driven missing data imputation method is used to fill in missing values in the initial dataset.In the second stage,three different CBR-driven single classification models are constructed by using Manhattan distance,Euclidean distance,and cosine distance to predict financial distress,respectively.In the third stage,the weighted majority voting strategy is used to ensemble prediction results of the CBR-driven single classification models to obtain the final prediction results.The main innovation of this work is to propose a CBR-driven missing data imputation method for the first time to solve the problem of data missing with different missing rates and construct a CBR-driven weighted ensemble classification model to improve the prediction accuracy and robustness for financial distress prediction.(2)CBR-driven financial distress prediction with data missing and class imbalanceFor the problem of financial data missing and class imbalance,this work proposes a two-stage CBR-driven classification learning paradigm for financial distress prediction with missing and class-imbalanced data.The proposed learning paradigm involves two main stages,CBR-driven missing data imputation and learning vector quantization(LVQ)-CBRdriven classifier prediction.In the first stage,the hybrid CBR-driven weighted imputation method is used to fill in missing values in the initial dataset.In this imputation method,Manhattan and Euclidean distances are adopted to measure the similarity between cases,and weights are assigned to the retrieved similar cases.In the second stage,the LVQ algorithm is first used to cluster samples in the training dataset,and each cluster builds as the case sub-library of the CBR classifier.Then the CBR classifier is used to predict financial distress,and each testing sample as a target case is retrieved,learned,and retained in its nearest case sub-library.The main innovation of this work is to propose a hybrid CBR-driven weighted imputation method to further improve the imputation performance and construct an LVQ-CBR-driven classification model to improve the prediction accuracy of minority class samples and total samples.(3)CBR-driven financial distress prediction with data missing and noiseFor the problem of financial data missing and noise,this work proposes a CBR-driven clustering imputation and noise-resistant classification learning paradigm for financial distress prediction with missing and noisy data.The proposed learning paradigm involves two main stages,CBR-driven missing data imputation and CBR-driven noiseresistant classifier prediction.In the first stage,the clustering-based CBRdriven imputation method is used to get a complete sample dataset.In this imputation method,the k-means algorithm is first used to cluster the samples in the dataset,and then the hybrid CBR-driven weighted imputation method is used to fill in the missing values in data subsets respectively.In the second stage,the CBR-driven noise-resistant classification model is constructed to predict financial distress.In this classification model,the k-means algorithm is used to identify the class noise samples in the case library,and then the confidence is introduced into each process of the CBR classification model to mark the class noise samples and reduce the interference of the noise samples on the classification model.The main innovation of this work is to propose a clustering-based CBR-driven imputation method to solve the problem of poor single imputation method caused by the uneven distribution of missing data and construct a CBR-driven noise-resistant classification model to solve the problem of low prediction accuracy caused by class noise samples.To sum up,starting from the idea and principle of the CBR method,this paper constructs three financial distress prediction models based on prominent data traits of missing,class imbalance,and noise in the dataset of predicting financial situations.By verifying on different datasets,these models can effectively improve the prediction accuracy for the enterprise’s financial situation.In addition,these models have good interpretability and can provide business advice and management insights for enterprises.Therefore,this paper not only has theoretical values but also has reference values in practical significance of enterprise financial distress prediction. |