| In classical logistic regression we estimate the model using maximum likelihood estimation.For ultra-high p( ? 9)),the algorithm may be computationally infeasible and statistically inaccurate.In order to guarantee the interpretation and accuracy of model estimation,we must find an effective variable selection method.In some engineering and scientific applications,predictors are grouped.In the high-dimensional regression setting,some existing methods borrow strength across distinct groups among the variables.These methods can remove unimportant variables and consistently estimate the effect of important variables.However,in most cases,such external information is not available.There is cluster elastic net for linear regression,which can infer clusters of features from the data.The cluster elastic net for linear regression can select important variables with unknown clusters.Thus,by integrating the idea from cluster elastic net for linear regression into logistic regression,this paper proposes cluster elastic net for logistic regression.Instead of assuming that the clusters are known as a priori,the cluster elastic net for logistic regression can estimate the clusters from the data,on the basis of correlation among the variables as well as association with the response.Apart from penalizing the coefficients only,this paper proposes a novel cluster penalty.Therefore,this method selectively shrinks the coefficients within a cluster towards each other,rather than towards the origin.This paper elaborates the theoretical advantages and algorithm,and then explores its performance in a simulation study. |