Font Size: a A A

Model Selection For High-dimensional Interactive Problems

Posted on:2022-10-22Degree:DoctorType:Dissertation
Institution:UniversityCandidate:Asenso Theophilus QuachieFull Text:PDF
GTID:1480306521967039Subject:Statistics
Abstract/Summary:PDF Full Text Request
In this thesis we consider classification models involving main coefficients and interaction coefficients.Analyzing high dimensional data with conventional tools is very challenging to statisticians.One of the reasons is that,most high dimensional data have a lot of interaction effects among the covariates but ma-jority of the existing methods analyze such dataset by using the general additive modeling techniques.This work focuses on studying a method for estimating covariates and their interaction effects in the binary and multiclassification set-tings.The thesis involves three applications of the pliable lasso model In the first part,we study the classification problem with interactive effects for the multinomial logistic regression models.Our approach involves the im-plementation of the pliable lasso penalty which allows for estimating the main effects of the covariates X and an interaction effects between the covariates and a set modifiers Z.The hierarchical penalty helps to avoid over-fitting by ex-cluding the interaction effects when the corresponding main effects are zero The original log-likelihood model is transformed into an iteratively reweighted least square problem with the pliable lasso penalty and then,the block-wise coordinate descent approach is employed.The approach allows us to solve the high-dimensional multiclass problem in situations where interaction variables are significant in improving the prediction accuracyIn the second part,we replace the l2-norm of the support vector machine(SVM)with the pliable lasso penalty to allow the SVM model to consider pos-sible interactions.The loss function employed is the squared hinge loss prob-lem with the pliable lasso penalty.The nature of the squared hinge loss allow us to implement the block-wise coordinate descent approach in optimizing the objective function.By this approach,we propose an algorithm for an entire regularization path for the support vector machine with interaction effects.We show through the simulation and real data applications that,our SVM model is efficient in handling high-dimensional interactive problems.It also allows us to analyze binary response data with group effects that may be hierarchical,using the SVM.In the final part of this thesis,we propose the differential privacy for empiri-cal risk minimization problems with interactive effects.This method is employed to ensure stability,hence good generalization of the pliable lasso model when performing adaptive data analysis.Our problem involves some regularization functions which do not meet the differentiability condition for privacy settings.The block-wise coordinate descent approach is employed to allow our model to satisfy this condition.In this case as compared to the classical gradient descent algorithm where updates operate on a single model vector,our approach per-forms the optimization on a single coordinate and the Gaussian noise is added to this coordinate to ensure privacy.This allows us to employs the lasso type penalties with ease without rigorous transformations or computations.We fi-nally apply our model on binomial and multinomial logistic regression.The numerical studies demonstrate that our method is both effective and efficient and can be used in analysis where there are interactive variables involved.
Keywords/Search Tags:Interaction model, High-dimensionality, Classification problems, Differential privacy, Pliable lasso
PDF Full Text Request
Related items