Font Size: a A A

Several Studies Of Models And Methods In Biostatistics

Posted on:2011-08-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y FangFull Text:PDF
GTID:1100330335964968Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
With the fast development of biology in recent years, the analysis of biological data has attracted more and more attention of statisticians. In this thesis, we study several statistical models, methods and theories for biological data, and verify the performance and applicability of the proposed methods via simulations and real examples.AIDS (Acquired Immune Deficiency Syndrome) has been a fatal hazard to human be-ings. At present, HIV (human immunodeficiency virus) dynamic studies are hot issues in AIDS research. HIV dynamic studies can provide important information for understand-ing the pathogenesis of AIDS and evaluation of treatment efficacy. This thesis firstly proposes a two stage method for random coefficient ordinary differential equations for the dynamics of longitudinal HIV data. In the first stage, we apply the local polynomial kernel for nonparametric mixed models to estimate the values and derivatives of the state variables. We then substitute the estimates in the first stage into the ordinary differential equations and propose a maximum pseudo likelihood estimation to obtain the estimates of unknown parameters. For the estimates of population effects, we investigate the large sample properties. By simulation studies and analysis of clinical AIDS data, we illus-trate the satisfying performance and usefulness of the proposed method. Also, we point out that the two stage method can be used for ordinary differential equations in other scientific fields, such as influenza virus, pharmacokinetics, etc..Gene chip (also called as DNA microarray) can simultaneously measure the expression levels of thousands of genes. Gene regulatory network (GRN) is an important topic in the research of gene expression data. Ordinary differential equation system is one of powerful tools for GRN. In Chapter 3, we study the data augmentation based pseudo least squares (DA-PLS) method for ordinary differential equation system. We prove the consistency, asymptotic normality and mean squared error of the parameter estimates. Also, under the principle of minimizing mean squared error of the estimates, we provide the strategies for selecting the bandwidth and sample size of augmented data. For the few replicates of gene expression data, i.e., small sample size, the approach of data augmentation can deeply mine the information of the original data to achieve better estimates. We apply the DA-PLS method for gene regulatory network in simulations and real examples, and verify that the DA-PLS method highly improve the accuracy of estimates compared to the PLS method (Liang & Wu 2008). From both theoretical and simulation perspectives, we provide solid justifications for the strategy of DA-PLS. Surely, the DA-PLS method is also applicable for ordinary differential equation system in other areas.Furthermore, detecting the differentially expressed genes under different conditions is another hot topic in microarray data analysis. Variance estimation of the microarray data plays a critical role in the detection. In Chapter 4, we investigate the asymptotics of permutation simulation extrapolation (PSIMEX) method for variance estimation of microarray data, including both cases of parametric and nonparametric variance functions. For parametric variance functions, a very general problem is studied. We discuss the consistency and asymptotic normality of parameter estimates when the chosen model may not be the true variance model. For nonparametric variance model, we derive the asymptotic normality of PSIMEX kernel estimates for variance function and give the optimal bandwidth for minimizing the approximated mean integrated squared error. In addition, with Monte Carlo method, we construct the confidence intervals for parameters and simultaneous confidence bands for variance function in the nonparametric case. By simulations, we find the confidence intervals and bands perform well. Also, two real examples of microarray data are analyzed to further illustrate the PSIMEX method.On the other hand, longitudinal data contains repeated measurements taken on each of subjects over time, arising frequently from biological, medical, agricultural studies as well as from other scientific areas. Mixed effects model is a popular tool for longitudinal data. While the common normality assumption for random effects and errors is not very robust. Besides, under the non-normality assumption, how to estimate the higher order moments of random effects and errors is also an interesting problem. In Chapter 5, we propose a moment estimation method for mixed effects models. Under the non-normality assumption, this approach can provide estimates of unknown parameters as well as higher moments of random effects and errors. The consistency and asymptotic normality of the method are proved. We also construct the confidence intervals and regions for parameters. Simulation studies are carried out to illustrate the method.
Keywords/Search Tags:AIDS/HIV Data, Data Augmentation, Gene Regulatory Network, Maximum Pseudo Likelihood Estimation, Microarray Data, Mixed Effects Models, Moment Estimation Method, Ordinary Differential Equation, Permutation Simulation Extrapola-tion(PSIMEX)
PDF Full Text Request
Related items