Font Size: a A A

Feature Weighting Research On Information Bottleneck

Posted on:2014-02-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:B JiFull Text:PDF
GTID:1220330398477054Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Information Bottleneck (IB) method is one of data analysis methods, which is originated from Information Theory. IB method uses a special way of data representation-the joint probability model (JPM)-so that it has good expression ability on the relativity between data features and data. However. JPM lacks the ability to express the important level of data features in comparison with the vector space model (VSM). It causes that most of the researches on the IB method ignore the important level of data features and weakens the effectiveness of the IB method. To address this issue, the thesis studies on the feature weighting on the IB method. The goal is to highlight important features by the aid of feature weighting. It can achieve better data representation and improve the effectiveness of the IB method.First of all, we propose the construction procedure from VSM to weighting JPM. Then, we propose a series of weighing IB method,such as combination weighting, self-learning weighting and two-stage-three-angles weighting for non co-occurrence data. Experimental results show that these methods are feasible and effective. Meanwhile, we suggest to applying mutual information gain on feature weighting evaluation. The mutual information gain evaluation reduces the running time without sacrificing the quality of clustering. Results from this study are useful for improving the effectiveness of the IB method and lay the foundation for constructing a set of the feature weighting IB analysis tools.The main researches in the study are stated as:1) Propose the construction procedure from VSM to weighting JPM on the basis of analyzing deeply the similarity and diversity between JPM and VSM. Weighting JPM combines the advantages of JPM and VSM. It has good expression ability on the relativity between data features and data. It also has good ability to express the important level of data features.2) Propose the relative entropy combination weighting IB method. How to choose a proper weighting scheme is a generally acknowledged devilish problem. The combination weighting derived from the idea of combination evaluation for multiple attribute decision making (MADM) can overcome the limitations of using single weighting scheme. It will help to reflect better the essential characteristics of the data. Firstly, we suggest considering only the combinations among objective weighting schemes. Secondly, we choose relative entropy as the combination method which can be computed in a short time. The experiments on real data have shown that the proposed C WRE-sIB algorithm is superior to the sIB algorithm. 3) Propose a feature weight self-learning mechanism for the IB method. A weight adjusting procedure is applied in the iteration stage. In the procedure, the weights of features are adjusted iteratively. The purpose of the feature self-learning mechanism is to simultaneously minimize the separations within clusters and maximize the separations between clusters so that it can improve the quality of the clustering result. Experimental results show that the proposed feature weight self-learning mechanism is objective and effective. The proposed FWA_CDsIB algorithm is superior to the CD-sIB algorithm.4) Propose mutual information gain for the weighting evaluation on the IB method. Among majority of weighting schemes and combination weighting methods, the traditional way evaluates the performance of feature weighting by measuring the quality of clustering. However, it is a time-consuming task because clustering algorithms should be run many times, and the number of times depends on the number of weighting schemes or the number of combination weighting iteration. We propose to judge the quality of feature weighting by the resulting gain in mutual information. Therefore, the top s weighted data representations can be selected from the weighting data representation set. Then, the best/second best cluster result can be obtained from the top s representations. Experimental results show that the mutual information gain evaluation reduces the running time without sacrificing the quality of clustering.5) Propose two stage three angles weighting method for non co-occurrence data. In order to analyze non co-occurrence data using the IB method, non co-occurrence data should be transformed into co-occurrence data with the binary transformation. At two stages of the co-occurrence transformation, we highlight representative features and dim irrelevant features from three viewpoints:non co-occurrence, co-occurrence and both. Experiment results show that the weighting binary transformation method generates better co-occurrence data. The TPAW-sIB algorithm is superior to the CD-sIB, LIMBO, ROCK and COOLCAT algorithms.
Keywords/Search Tags:Information Bottleneck, Feature Weighting, CombinationWeighting, Self-learning, Weighting Evaluation, Non Co-occurrence Data
PDF Full Text Request
Related items