Font Size: a A A

Research On Fuzzy Rough Computing Method Of Unsupervised Knowledge Discovery For Mixed Data

Posted on:2023-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z YuanFull Text:PDF
GTID:1528307073479054Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology,a large amount of unlabeled data floods into our daily life.Faced with these data,how to discover novel and valuable knowledge from it becomes particularly important.This facilitates the continuous development of unsupervised knowledge discovery techniques.Similar to the knowledge discovery process,a complete unsupervised knowledge discovery process mainly includes data preprocessing,unsupervised data mining,pattern evaluation,and knowledge representation.Unsupervised attribute reduction is one of the important methods of data preprocessing in the process of unsupervised knowledge discovery,which reduces the amount of unlabeled data by removing irrelevant or redundant attributes.Unsupervised outlier detection is also a very important research direction in unsupervised data mining,the purpose of which is to find out processes whose behavior is much different from expected.Unsupervised outlier detection has been successfully applied to internet fraud,loan approval,medical diagnosis,etc.However,most unsupervised attribute reduction and unsupervised outlier detection methods are only applicable to single attribute data such as nominal or numerical values.There is a large amount of unlabeled mixed data containing numerical and nominal attributes in real life.Therefore,how to design corresponding attribute reduction and outlier detection algorithms for unlabeled mixed data is a challenging problem.As one of the important mathematical models of granular computing theory,fuzzy rough set theory has been successfully applied to mixed data supervised knowledge discovery model.Nevertheless,the application and research of fuzzy rough set model in unsupervised knowledge discovery of mixed data is still preliminary and needs to be further studied.For this reason,this dissertation takes fuzzy rough sets as the theoretical basis,and focuses on unsupervised attribute reduction and unsupervised outlier detection method for mixed data.Specifically,the main research results of this dissertation are as follows.(1)For the fuzzy rough computing problem of unsupervised mixed attribute selection,two unsupervised attribute selection methods for mixed data are studied in this dissertation.One approach proposes unsupervised hybrid attribute selection based on fuzzy dependency.First,the average dependency about all single attribute sets is integrated to define the dependency of a certain attribute subset,and thus the significance is defined to characterize the importance of a candidate attribute.Finally,a fuzzy rough set-based unsupervised attribute reduction algorithm is designed.Another approach proposes a new kernel fuzzy complementary entropy by using the hybrid kernel function,which considers the complementary information of fuzzy granules.Firstly,based on the proposed kernel fuzzy complementary entropy,kernel fuzzy complementary joint entropy,kernel fuzzy complementary conditional entropy,and kernel fuzzy complementary mutual information are defined,respectively.Then,it is proved that the kernel fuzzy complementary conditional entropy and kernel fuzzy complementary mutual information vary monotonically about the attributes.Further,based on the proposed uncertainty measures,three kinds of attribute significance are proposed.Finally,the corresponding generalized unsupervised heuristic attribute selection algorithm is designed.(2)Aiming at the fuzzy rough computing problem of unsupervised mixed attribute ranking,this dissertation proposes two unsupervised attribute ranking methods for mixed data.For the first one,an unsupervised mixed attribute ranking method based on fuzzy mutual information is proposed.First,fuzzy mutual information is used to define the fuzzy relevance of each feature,and thus the feature is selected with the largest fuzzy relevance.Then,fuzzy conditional relevance is defined to characterize the correlation of a feature when a certain feature is known,and thus fuzzy redundancy is defined to characterize the redundancy of a candidate feature.Furthermore,the feature importance evaluation index of unsupervised minimum redundancy-maximum relevance is constructed for subsequent feature selection.Finally,a fuzzy mutual information-based unsupervised feature selection algorithm is designed.For the second one,an unsupervised interactive mixed attribute ranking method based on fuzzy complementary entropy.First,based on fuzzy complementary entropy,fuzzy complementary joint entropy,fuzzy complementary conditional entropy,fuzzy complementary mutual information and fuzzy complementary conditional mutual information are defined,respectively.The relationships among several uncertainty measures are discussed.Then,based on fuzzy complementary joint entropy,fuzzy complementary mutual information,and fuzzy complementary conditional mutual information,the evaluation criteria of maximum relevance,minimum redundancy,and maximum interactivity are respectively defined to express the importance,redundancy and interactivity between attributes.Therefore,the evaluation index of the attribute impor-tance of unsupervised maximum information-minimum redundancy-maximum interaction is obtained.Finally,an exploring unsupervised interactive attribute reduction algorithm is designed.(3)In view of the fuzzy rough computing problem of unsupervised mixed attribute outlier detection,two unsupervised outlier detection models for mixed data are finally investigated in this dissertation.In the first one,an outlier detection model based on fuzzy rough granules is proposed.Firstly,the definition of fuzzy approximation accuracy is given.Secondly,based on the fuzzy approximation accuracy,the granule outlier degree is constructed to characterize the outlier degree of fuzzy information granules.Again,the outlier factor based on fuzzy rough granules is constructed by integrating the granule outlier degree and the corresponding weights to characterize the outlier degree of data objects.Finally,a specific fuzzy rough granule-based outlier detection algorithm is designed.In the second one,a fuzzy information entropy-based outlier detection model is proposed for mixed attribute data.In the method,fuzzy information entropy is used to define the fuzzy relative entropy.And then the basic metric is constructed to characterize data objects.Finally,the fuzzy information entropy-based outlier factor is defined to implement outlier detection,and the fuzzy information entropy-based outlier detection algorithm is designed.Furthermore,experimental comparison and analysis of the above-mentioned research results are carried out on some public data sets.Experimental results show that the proposed unsupervised attribute reduction algorithm and unsupervised outlier detection algorithm both has better effectiveness and adaptability.In addition,since the proposed model optimizes mixed similarity measures to construct a fuzzy rough set model,the proposed algorithm is suitable for nominal,numerical,and mixed attribute data.The method obtained in this dissertation solves the problem of knowledge discovery for unlabeled mixed data,and at the same time extends the application of fuzzy rough set theory in the field of unsupervised knowledge discovery.
Keywords/Search Tags:Unsupervised knowledge discovery, Unsupervised attribute reduction, Unsupervised outlier detection, Granular computing, Fuzzy rough set theory, Mixed data
PDF Full Text Request
Related items