Font Size: a A A

Mixed Integer Linear Programming Based Implementations of Logical Analysis of Data and Its Applications

Posted on:2014-12-26Degree:Ph.DType:Dissertation
University:Florida Institute of TechnologyCandidate:Avila Herrera, Juan FelixFull Text:PDF
GTID:1450390005989829Subject:Operations Research
Abstract/Summary:
The objective of this dissertation is to develop systematic procedures, which take advantage of advanced combinatorial optimization techniques and computer-related developments, to build on a previously successful two-class classification method, called Logical Analysis of Data (LAD), for optimizing feature selection and identifying the set of combinatorial patterns in large-scale data analysis.;First, we propose an embedded pattern-based feature selection technique. Our feature selection algorithm aims at identifying a small subset of highly influential features from a large-scale dataset to build reliable LAD classification models. The proposed method searches among different feature subsets and interacts with the LAD classification algorithm and its ability to discriminate among the classes. To accomplish this we develop a new software tool, called LFW, which can be used to determine the highest ranking features in the dataset.;Next, we propose a new approach based on integer programming and network flows to select significant patterns to generate accurate LAD models. Our algorithm allows the user-specified significance requirements on patterns such as statistical significance, Hamming distances to ideal patterns, and other pattern characteristics including homogeneity and prevalence. We evaluate, through several experiments on artificial and benchmark datasets, the accuracy of LAD classification models built using our proposed approach, as compared to the accuracy of greedy-heuristic based LAD models.;Traditionally the LAD algorithm is designed to solve two-class classification problems. We present a mixed integer linear program to extend the LAD algorithm to multi-class classification. Our multi-class LAD algorithm efficiently generates reliable multi-class LAD models and takes advantage of parallel programming. The utility of the proposed approach is demonstrated through several experiments on multi-class benchmark datasets.;Finally, we apply the techniques developed in this dissertation to a real-world medical dataset collected as part of the African-American Study of Kidney Disease and Hypertension (AASK). We present various classification models to predict the progression rate of chronic kidney disease and to identify the set of serum proteomic features highly related to the disease outcome.
Keywords/Search Tags:LAD, Data, Programming, Integer, Feature
Related items