Font Size: a A A

Investigation The Influence Of Dealing With Missing Values In Longitudinal Data On The Analysis Models

Posted on:2014-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhaoFull Text:PDF
GTID:2267330425489512Subject:Statistics
Abstract/Summary:PDF Full Text Request
The implementations of longitudinal data push data model to a new platform, and it’s regarded as a hot topic both on theoretical research and practical application of the statistics. As the longitudinal data combines the characteristics of time series data and cross-panel data, it can not only better reflect the changing trend of the samples but also can show the variances of within groups and between groups more accurately. In the related research areas of using longitudinal data in order to obtain a comprehensive observation for analysis they usually apply repeated observations of the same object during different points in time. But this also easily induces the non-responses due to the long observation time and the changing experiment conditions. In the areas of medical statistics and biological statistics missing data also happens because of people who are observed feel uncomfortable, moving out, losing interests in the survey and dropout so that the observations can’t be finished as planned.Non-response may happen in any time in the investigation of collecting longitudinal data, if there was non-response, the continuously changing trend will be affected. The existence of missing data would cause a decline of the data quality and then affect the availability of data. It also breaks the whole structure of the data and violates the assumptions in the analysis models which make the corresponding statistical analysis have different biases to some extent, which reduces the accuracy and precision of the results, brings confusion in explaining the research problems and further influences the efficiency of the whole work.In the real life of analyzing the longitudinal data with missing values, deletion of all the non-responses to conduct the complete-cases analysis is usually used to simplify the problem. This method reduces the samples in a certain degree and is the default way to deal with missing values in some statistical analysis software. However, only in the conditions that the data is missing completely at random and there is no big difference with the other units, this method can give out the correct estimates. Any situations of violating assumptions could induce the error of the estimates and the decline of the precision, which make the analysis have severe deviations. The attempt of applying other methods to deal with the missing data and select the appropriate analysis model is important to the utility and value of the outcomes.This paper conducts the investigation of the influence on the analysis models from dealing with missing values in longitudinal data. The real longitudinal data of health related quality of life is used in this paper (see Gejilo etal.2008), and then compare the statistical results between the complete-cases analysis of directly deleting all the non-responses and imputation methods. EM algorithm and multiple imputation by chained equations are conducted in the computational example. The results show that multiple imputation by chained equations can always perform better than the other two. Based on the comparison results, a collection of mixed regression models which are commonly used for analyzing longitudinal data are evaluated by applying the multiple imputed set. The results show that applying multiple imputation method to deal with missing values in longitudinal data can use the data information more adequately. Together with the statistical model, the analysis can get more appropriate results.
Keywords/Search Tags:Missing Values, Longitudinal Data, Repeated Measurement ANOVA, MultipleImputation, Mixed Regression Model
PDF Full Text Request
Related items