Font Size: a A A

Penalized Empirical Likelihood Method Of Logistic Regression In High-dimensional Classification

Posted on:2016-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:W W XuFull Text:PDF
GTID:2297330467480065Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the formation of information industry,and the sustained development of China economy, the global data and data dimensionmanifest an explosive growth. Thus the human society has entered an era of bigdata. The influence on the economic and social development of big data has becomemore and more widely and deeply. In the context of large data, high-dimensional dataplays an important role, which widely exists in all social fields. The increase of datadimension brings dimension disaster to statistical inference. In this case, how todistinguish between the useful information and the useless "junk" has gradually becomea hard problem. In order to select the useful information, all information needs to beclassified at first. Therefore, the problem of classification in high-dimensional data hasbecome an important issue of scientific research, which has important theoreticalsignificance and wide application value, as well as challenge.The research on the classification problem could be analyzed from two aspects ofstatistical analysis and machine learning. The most classification methods are all basedon the data itself, without considering the structure of the data. The Logistic regressionmodel classification method is based on the specific model, which is very effectivein solving the classification problem. Compared with other classification methods,Logistic regression model has many advantages. On the one hand, compared with datadriven classification methods, the Logistic regression model can be able to explain theresults, in addition to obtain the probability of each category. On the otherhand, compared with other linear model classification methods, the Logistic regressionmodel does not require any prior knowledge about the sample and distributionhypothesis. Besides, it has no requirements on the type of independent variables.Therefore, the Logistic regression model analysis as an effective method of dataclassification is widely applied in various fields. When the Logistic regression modelwas used for solving classification problems, there was still a problem of parameterestimation. The empirical likelihood estimation method for the Logistic regressionmodel has certain advantages, especially when the distribution of the data is unknown.Based on the above background, the Logistic regression model was applied tostudy high-dimensional data classification problems. The following were the mainresearch contents. The Logistic regression model was established for high-dimensional data classification and a penalized empirical likelihood method based on the Logisticregression model was proposed. Besides, the large sample properties of the empiricallikelihood estimation were also proved. The simulation proved that the penalizedempirical likelihood estimation of the Logistic regression model was effective insolving the classification problem. At last, the penalized empirical likelihood methodbased on the Logistic regression model was applied to specific examples.This paper was organized as follows.The background and significance of the Logistic regression model forclassification in the big data was introduced in the first chapter. The researchresults of the high-dimensional classification problem, the Logistic regression modeland the penalized empirical likelihood method were also reviewed in this part.The main body of the theory part was given in the second chapter. A penalizedempirical likelihood function of the Logistic regression model was constructed.Moreover, the penalized empirical likelihood method based on the Logistic regressionmodel for high-dimensional data was proposed in this part. The local quadraticalgorithm and the adjusted BIC criterion were used in the penalized empirical likelihoodmethod of the Logistic regression model. The Oracle properties of the penalizedempirical likelihood estimation of the Logistic regression model were also proved.The numerical simulation was given in the third chapter, Through two numericalsimulation examples, consisting of the distribution assumption correct, and thedistribution assumption misspecified, from the parameter estimation accuracy, thegoodness of fit for the model and the classification accuracy rate, the penalizedempirical likelihood of the Logistic regression was proved effectively in solving theclassification problem.The example analysis was put in the fourth chapter. The coronary heartdisease data and the breast cancer data were analyzed in this part. The penalizedempirical likelihood method of the Logistic regression model was applied to these twoexamples. Through the comparisons with other methods, the penalized empiricallikelihood method of the Logistic regression model was proved that it hasgood properties in classification.
Keywords/Search Tags:Classification problems, Logistic regression model, High-dimensional data, Penalized empirical likelihood, SCAD
PDF Full Text Request
Related items