Font Size: a A A

Researches On Classification Algorithm Based On Associated Information For Categorical Data

Posted on:2019-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:K A FuFull Text:PDF
GTID:2428330551458745Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid popularization of the Internet,mobile Internet and Internet of Things,sources of obtained data will be more and more.The complexity of data is increasing,which has brought huge challenges for data analysis.Among these complex data,there is one type of data called categorical data,whose value is qualitatively described.In real life,categorical data is more common,like color,nucleotides coding.However,due to the lack of size and inherent geometrical structures for categorical data,it is impossible to perform numerical calculations directly on these data.So many classification approaches for numerical data often fail to deal with classification problems of categorical data.Many experts have done some research on classification methods for categorical data,but there is still a great gap for categorical data compared with the classification performances for numerical data.Hence effectively mine the internal information of categorical data and the research still has a great applied value on the classification of categorical data.Aiming at the problem of low classification accuracy of categorical data,this thesis carries out systematic works by deeply mining the correlation information between attributes and labels.The research is summarized as follows.(1)Proposed a classification approach based on correlation analysis for categorical data.A new quantization method for categorical data is defined by analyzing the correlation between attribute and labels and considering the frequency of each value in the attribute.Then a SVM classification algorithm based on correlation analysis is proposed,namely CA_SVM.The experiment results on the public UCI datasets demonstrate that the proposed CA_SVM algorithm has better classification performance compared with the other three traditional classification algorithms for categorical data.(2)Proposed a classification approach based on space correlation analysis for categorical data.For solving the problem of loss information in the process of quantization for CA_SVM algorithm,a spatial representation for categorical data on the basis of mutual information or conditional entropy is defined by making a further research on the space correlation between attributes and labels.And SVM and KNN model are used respectively,and two improved classification algorithms based on space correlation analysis are designed,namely SCA_SVM and SCA_KNN.The experiment results on the public UCI datasets demonstrate that the proposed SCA_SVM algorithm and SCA_KNN algorithm can better measure the distance or difference between different attribute,and have better classification performance compared with CA_SVM algorithm and other traditional classification algorithms for categorical data.Aiming at the problem of low classification accuracy of categorical data,two classification algorithms based on correlation information by mine the association between attribute values and labels are proposed in this thesis.They can effectively improve the classification accuracy of categorical data.In addition,the obtained results can also be regarded as an application extension of SVM and KNN model.
Keywords/Search Tags:Categorical data, Classification, Correlation analysis, Space Correlation
PDF Full Text Request
Related items