Font Size: a A A

Fuzzy Information Structure,uncertainty Measure And Feature Selection Based On Categorical Data

Posted on:2022-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ChenFull Text:PDF
GTID:2480306773969229Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Categorical data is an important type of data in machine learning and data mining.Generally speaking,rough set theory deals with categorical data by using equivalence relations,and these equivalence relations are based on the equal information value of categorical data.Then information granules are constructed by them.However,the equivalence relation is too strict,which leads to some limitations in the information granules.Then it may filter out potentially useful information.In addition,most feature selection algorithms for categorical data are based on lower approximation,dependency,and information entropy,which are not suitable for massive and complex big data environments.Because the time complexity is closely related to size of the sample,it will consume a lot of storage space and computing time.This paper studies the fuzzy information structure and new uncertainty measures from the perspective that "the equality of information values is fed back to the feature set" and proposes two feature selection algorithms for categorical data.The main work includes the following three parts:(1)The fuzzy symmetric relation based on equal information values is established,then the corresponding fuzzy information granules are constructed.The fuzzy information structure is represented by a vector,and the relationship between them is studied by the inclusion degree.(2)The four new uncertainty measures for categorical information systems are proposed.The properties of these four measures are given,and numerical experiments and statistical tests are performed to evaluate the performance of the new measures.In addition,the entropy measures(including fuzzy information entropy,fuzzy conditional information entropy,and fuzzy joint entropy)of categorical decision information systems are proposed and their related properties are discussed.(3)Using two new measures,fuzzy information granularity and fuzzy information entropy,the feature selection algorithm of categorical information systems is given,respectively,and the experimental analysis is carried out.Aiming at the problems of cumbersome calculation formulas and high time and space complexity in most feature selection algorithms for categorical data,an iterative formula between fuzzy conditional information entropy and importance is given,and a feature selection algorithm for categorical data based on fuzzy conditional information entropy iterative model and matrix operation is proposed.Numerical experiments and statistical tests are carried out to evaluate the performance of the proposed algorithm.
Keywords/Search Tags:Categorical data, Fuzzy information structures, Measurement, Feature selection
PDF Full Text Request
Related items