Font Size: a A A

Comparisons of linear probability model, linear discriminant function, logistic regression, and K -means clustering in two-group prediction

Posted on:2004-09-17Degree:Ph.DType:Dissertation
University:Indiana UniversityCandidate:So, Tak-Shing HarryFull Text:PDF
GTID:1450390011456770Subject:Statistics
Abstract/Summary:
The use of statistical models to predict group membership is common in education. Several methods are applicable for formulating a relationship between a dichotomous outcome variable and predictors. The questions facing educational researchers are (1) which method yields a better model for prediction and (2) on what basis. This study is the first one to systematically and comprehensively compare the predictive accuracies of linear probability model (LPM), linear discriminant function analysis (LDF), logistic regression (LRM), and K-means clustering (KM) in a two-group situation. Multivariate normally distributed populations were simulated based on combinations of population proportions, equality of covariance matrices, and group separation. LPM, LDF, LRM, and KM were applied to training samples that were drawn according to pre-specified sample representativeness and sample sizes. Error rates were tabulated based on the cross-validation results of four statistical models applied to test samples. Two objectives of this study were established and accomplished. The first objective was to investigate the impacts of various data properties and different prior probabilities on the predictive accuracy of LDF. The results indicated: (1) assuming equal prior probabilities minimized the error rate in predicting the membership of the smaller population, and (2) setting prior probabilities based on sample proportions minimized the error rate for the larger population. The second objective was to compare the accuracy of predicting two-group membership obtained from linear probability modeling, linear discriminant function, logistic regression, and K-means clustering under various data properties. The findings revealed that when the accuracy of predicting the membership of the smaller population was the main objective, (1) LPM was not a method of choice when the population proportions were extreme, (2) the selection of LDF or LRM depended on the conditions of population proportion, equality of covariance matrices, and group separation, and (3) depending on the data pattern, KM was a viable alternative.
Keywords/Search Tags:Linear discriminant function, Logistic regression, Linear probability, Model, Clustering, Two-group, LDF, Membership
Related items