Font Size: a A A

Research On Deep Unsupervised Categorical Data Mining For Decision Support

Posted on:2022-04-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X N GaoFull Text:PDF
GTID:1480306320474504Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Data-driven decision support is an important part of the decision-making process,containing three key stages:representation learning,analysis and evaluation.Structured unlabeled categorical data is one of the main data types in decision support peocess.It has the characteristics of enumerative attribute values,undifferentiable and incapability in algebraic operations,resulting in the existing mining methods for this type of data can not precisely measure the relationship between data objects,which affects the accuracy of mining results.Along this line,the decision support process for categorical data is facing the dilemma of lack of effective analysis methods.Only a few methods with limited mining effect can be used,which leads to that the correctness and scientificity of the final decision-making can not be guaranteed.Deep learning has made remarkable achievements in unsupervised structured numerical data and unstructured data mining,which provides a new research idea for solving the unsupervised categorical data mining problem.Facing the three key stages in decision support process,this paper proposes effective solutions for categorical data,by borrowing the basic idea of deep learning and addressing the difficulties in unsupervised categorical data mining,which can provide theoretical support and effective solutions for the decision-making process.The main practical problems solved in this paper are as follows:(1)In the representation learning stage,the data needs to be mapped from the original feature space to the decision space,for preparing the data basis for the subsequent analysis process.For the categorical data,the existing methods have difficulties in deeply mining the potential features of the data,resulting in the failure to learn the representation that can accurately express the relationship between data objects,which affects the effectiveness of the follow-up analysis process.(2)In the analysis stage,based on the learned representation,suitable analysis methods should be selected or developed,and the potential useful knowledge can be discovered,for producing high-quality alternatives for decision-making.As for the categorical data,the existing methods have difficulties in accurately measuring the relationship between data objects,resulting in poor accuracy of the analysis results.(3)In the evaluation stage,the effectiveness of the analysis results should be quantitatively evaluated,and then the optimal result can be identified,ensuring that the decision-making process is supported by the scientific analysis results.For the categorical data,the existing methods do not fully measure the effectiveness of all relevant information in the analysis results,which affects the accuracy of the evaluation result,resulting in the scientificity of decision support process can not be ensured.To solve the above practical problems,this paper carries out research on the deep unsupervised categorical data mining for decision support.The contributions of this work are summarized as follows:(1)A deep feature learning method for categorical data is proposed,by drawing on the basic idea of deep unsupervised feature learning and network embedding,it can deeply dig into the real meaning and related relationships hidden in the categorical data,and express it explicitly in the learned representation,which can lay of the data foundation for the decision support process.Existing related researches have the defects of insufficient potential feature mining and the feature learning is sensitive to parameters.Furthermore,deep learning methods cannot be directly applied to categorical data mining task due to the characteristics of undifferentiable and incapability in algebraic operations.This paper converts structured categorical data into network data,breaks the barriers of applying deep learning to solve categorical data mining problem,and can learn the representation containing the useful information hidden in the original data,for preparing the data foundation for the subsequent analysis process.(2)A deep clustering method for categorical data is proposed,by drawing on the basic ideas of deep clustering and network embedding,which can accurately measure the relationship between data objects,produce outstanding clustering results,for providing effective analysis method for the decision support process.The existing related researches can not measure the relationship between categorical data objects accurately that affects the performance of clustering results.This paper constructs and integrates clustering loss and feature learning loss,based on the basic ideas of deep clustering and network embedding,to assess the relationship precisely and improves the effectiveness of clustering results,which can provide a more accurate mining method for the decision support process.(3)An internal cluster validity index for categorical data is proposed,which can precisely measure the detailed distribution information in the categorical data clustering results,and obtain accurate evaluation results.This index is suitable for the deep clustering evaluation task,and provides a guarantee for scientifically supporting the decision-making process.Existing related studies are based on the independence assumption of the categorical attribute values,which can only measure the overall performance of the cluster in clustering result,and ignore the detailed distribution of data objects.In this paper,a distance metric for categorical data is constructed,that meets the definition of distance.And an excellent validation framework is discovered.Based on this,the presented index can evaluate the performance of categorical data clustering results accurately,by considering all the detailed distribution information in the clustering results as much as possible,to ensure the scientificity of the decision support process.(4)A series of methods of deep unsupervised categorical data mining for decision support are established,which can be used as a set of solutions to deal with unlabeled categorical data faced in the decision support process.The deep feature learning method,deep clustering method,and internal cluster validity index for categorical data proposed in this paper are applied to support the talent recruitment decision process.The talents are segmented in to multiple clusters and the optimal talent segmentation is identified.Based on this,the characteristic of each category of talent is analysed and the management suggestions are summarized,which can support talent recruitment decision making.This practical application proves that the deep unsupervised categorical data mining methods proposed in this paper can be used as a set of solutions for the decision support process.
Keywords/Search Tags:Data Mining, Decision Support, Categorical Data, Unsupervised Data Mining, Deep Learning
PDF Full Text Request
Related items