Font Size: a A A

Univariate and bivariate variable selection in high dimensional data

Posted on:2005-10-16Degree:Ph.DType:Dissertation
University:University of California, BerkeleyCandidate:Ng, Vivian Wai YingFull Text:PDF
GTID:1450390008979950Subject:Statistics
Abstract/Summary:
Many variable or feature selection methods have been introduced for the past several years. Most of these methods are aimed at classification problems in high dimensional input space. However, it is shown that there are many drawbacks for these widely used methods. These weaknesses pose new challenges in analyzing large data sets in various domains, such as QSAR. With no optimal solution from existing methods, a universal and robust new variable selection method, which is not confined by these known restrictions, is needed.; The new univariate variable selection method, which is built on top of Random Forest, is shown to be effective. The method not only has exceptional performance on real and artificial data sets, but also is computationally fast and efficient. The performance of this method is demonstrated to be at least as good as the performance of existing methods. The usefulness of this new technique is further proved through the application to the NIPS 2003 feature selection challenge.; Currently, there is not much emphasis on variable selection on pairs of variables that have a significant joint effect but do not have much influence by themselves. Unfortunately, univariate variable selection methods are not suitable in selecting pairs of linked variables. To address this issue, a new bivariate variable selection method is proposed, which is illustrated to possess the desired property of identifying pairs of significant variables. The performance of this method is verified with real and simulated data sets along with permutation tests, which provide baseline measurement for assessing the significance of identified pairs of variables. Furthermore, a graphical device is introduced to display the relationship between each coupling pair and the response variable.
Keywords/Search Tags:Variable, Selection, Methods, Univariate, Data
Related items