Font Size: a A A

Influence Analysis Of Linear Regression Model And Diagnosis Of Outliers

Posted on:2012-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:G S SunFull Text:PDF
GTID:2120330335973128Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Linear model is a kind of statistical model, which could be indicated approximately in economics, medicine, biology, sociology, agriculture, industry, meteorology, geology, therefore, linear model becomes one of the most far-ranging models in modern statistics. Beginning with compatibility of linear model itself, the influence of linear model with deficient-rank is discussed in this thesis. The influence analysis of linear model with deficient-rank is turned into the influence analysis of linear model with full-rank by a transformation proving the pre-and post equivalence of the two pairs linear models.At the foundation of Least Square Theory, the thesis discusses the choice problems of variables in linear regression model. Two broad situations are considered:one in which variables that should be included are excluded, and another in which variables that should properly be excluded are included. These two problems may be called the problems of underfitting and overfitting, respectively. By employing the knowledge of partition matrix, the thesis analyses it in detail and ensure the variate whether should be retain or not by combining a regression coefficient which is significant or not and the remaining regression coefficients which are changed or not.The thesis explores the recognition of deviances deeply in linear model and popularize it by summarizing the essence of forthgoers. On the basis of different influences of linear model, illustrate the classification of outliers and the possible reasons and treatment method. The phenomenon of masking and swamping are always be arised when diagnosing deviances, which make difficulties for diagnosis. The thesis put forward a new method to identify many deviances. First of all, we think of data as being divided into two classes:"good" observations(majority of data) reflecting the underlying population scatter of data and "bad" observations(outliers). The goal of this method is to find this true partition and separate the "good" from the outlying observations. The proposed method does not require advance knowledge of the number of outliers, and an analyst can choose a significance level a at which observations are considered to be outliers.
Keywords/Search Tags:linear model, influence analysis, underfitting, outlier, masking
PDF Full Text Request
Related items