Font Size: a A A

Finding natural clusters through entropy minimization

Posted on:1990-04-26Degree:Ph.DType:Thesis
University:Carnegie Mellon UniversityCandidate:Wallace, Richard ScotFull Text:PDF
GTID:2478390017954148Subject:Computer Science
Abstract/Summary:PDF Full Text Request
The main contribution of this thesis is a two-step procedure that finds natural clusters in geometric point data without requiring a user to specify any threshold parameters or "magic numbers." The first step exploits a new algorithm called NIHC (Numerical Iterative Hierarchical Clustering). NIHC finds a cluster tree minimizing an entropy objective function. The second step is a recursive procedure that searches the tree for level clusters having minimum description length (MDL).; This thesis reports experiments with and analysis of NIHC. The algorithm uses a transformation called the grab operation to iteratively reduce the value of an objective function defined recursively over the tree. Gaussian entropy is one such function studied here. NIHC reaches lower local entropy minima than the standard agglomerative algorithm.; The input to NIHC is an arbitrary cluster tree such as a k-d tree. NIHC repeatedly searches for grabs to transform each subtree into a lower energy state. When NIHC can find no more energy-reducing grabs resulting tree is partially optimal in a strong sense that does not hold for cluster trees produced by other algorithms. Experiments quantitatively compare NIHC using Gaussian entropy with other hierarchical clustering algorithms. To cluster n points, an iteration of NIHC takes at most O(n{dollar}sp3{dollar}) steps for an unbalanced tree and at most O(n{dollar}sp2{dollar}) steps for a balanced tree. NIHC does not store an O(n{dollar}sp2{dollar}) cluster distance matrix. NIHC consumes only O(n) storage for the tree itself. This thesis also reports further speedups, such as a branch-and-bound procedure, heuristics to prune the grab search and parallel versions of NIHC.; MDL theory is concerned with finding minimum information models that encode data. A description length formula tells how many bits are needed to represent the data given some clustering, plus the bits needed for the clustering itself. MDL theory eliminates free parameters from clustering objective functions. A program finding the MDL level clustering solves some point perceptual grouping problems.; A space-time clustering program uses NIHC to track time-varying clusters in robot sensor data, such as data obtained from from a range sensor in the presence of moving obstacles. This program is demonstrated in two domains: following a road in ERIM range data and tracking multiple moving obstacles in 2-d laser range data. ftn*This research was sponsored in part by the Defense Advanced Research Projects Agency (DOD), ARPA Order No. 7976 under Contract F33614-87-C-1499 and monitored by: Avionics Laboratory, Air Force Wright Aeronautical Laboratories, Aeronautical Systems Division (AFSC), Wright-Patterson AFB, OH 45433-6543. Support was also given by a Hughes Aircraft Company Fellowship.; The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency, Hughes Aircraft Corporation or the U.S. Government.
Keywords/Search Tags:Cluster, NIHC, Entropy, Data, Tree, Finding, MDL
PDF Full Text Request
Related items