Font Size: a A A

Study On Diagnostic Methods Of Multicollinearity In Logistic Regression

Posted on:2011-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:X M YuFull Text:PDF
GTID:2144360305975699Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:To study diagnostic method of multicollinearity in logistic regression for establishing correct model. And by comparing various diagnostic methods, to search for a method which is effective and easy for medical researchers to use. So that medical researchers can establish correct logistic regression model.Methods:Apply the diagnostic methods used in multiple linear regression to logistic regression. Use example:A study on factors of postpartum depression, the independent variables includes relationship with parents, four dimensions of personality questionnaire EPQ:EPQL, EPQE, EPQP, EPQN, previous history of depression, maternal sleep and so on. Compute the binary correlation coefficient between independent variables, the variance inflation factor, the tolerance, the system of eigenvalue, the multiple coefficient of determination and the value of determinant. Evaluate the effectiveness, advantages and disadvantages of these methods according to the results.Results:1.binary correlation coefficient between independent variables:The correlation coefficient between EPQE and EPQN, EPQL and EPQE, EPQL and EPQN, EPQP and EPQN are greater. Multicollinearity exist in these four pairs of variables.2.variance inflation factor and tolerance:The variance inflation factor of EPQE and EPQN are greater, the tolerance of EPQE and EPQN are smaller. There is multicollinearity between EPQE and other variables, the same with EPQN.3.system of eigenvalue:Four eigenvalues are smaller than 0.05, two eigenvalues are smaller than 0.01. It means that there are two to four pairs of multi-collinearity relationship in the model. Two condition indice are greater than 30. One is 43.550, its corresponding variance proportions of EPQL and EPQE are 88%and 49%. The other one is 60.026, its corresponding variance proportions of EPQE and EPQN are 48%and 52%. Collinearity exists in these two pairs of variables.4.multiple coefficient of determination: All the independent variables can induce multicollinearity except EPQN.5: the value of determinant:D=6.9296×10-10<0.01 So there is seriously multicollinearity in the model.6.According to the result of these diagnostic methods of multicollinearity,remove the variables which induce the multicollinearity from the model, run logistic regression again, the result is agree with other similar studies at this time.Conclusions:logistic regression is also sensitive to multicollinearity, so we should pay attention to this problem. Some diagnostic methods used in multiple linear regression are also applied in logistic regression. By comparing various methods, the author consider the variance inflation factor, the tolerance and the system of eigenvalue are appropriate for medical research, because they are easy and effective. They can help researchers to solve the multicollinearity and draw the right conclusions.
Keywords/Search Tags:logistic regression, multicollinearity diagnosis
PDF Full Text Request
Related items