Font Size: a A A

Design And Analysis Of Some Complex Experiments

Posted on:2015-04-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Z HuangFull Text:PDF
GTID:1220330467465603Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
As science and technology have advanced to a high level nowadays, experiments are performed almost everywhere as a tool for studying complex processes and sys-tems. There are two aspects involved in an experiment:design and analysis. The basic idea of the former is that the effectiveness of the later can be improved by appropriately selecting the values of the control variables; while the later is usually referred to as statistical inference, such as modeling, variable selection, estimation, prediction, optimization and so forth.Experiments can be classified into two classes:physical experiments and comput-er experiments. A physical experiment is referred to as the one that is implemented in a laboratory, a factory, or an agricultural field, where the experimenters physi-cally carry out the experiment. In contrast, a computer experiment is conducted by sophisticated computer codes. The majority of this dissertation concerns the design and analysis of computer experiments that account for their special features. Specifically, we focus on the following three topics:1. Design for computer experiments where the used designs need to be divided into slices.2. Design and analysis for computer experiments with both qualitative and quan-titative variables (BQQV);3. Variable selection for computer experiments where large trends exist.All the computer experiments mentioned above are newly developed in recent years and have not been thoroughly explored so far. This dissertation aims to fill some vacancies of the current study and open some new related research directions. A small part of this dissertation concentrates on variable selection for supersaturated designs (SSDs)-an important class of designs for factorial screening experiments that has been studied extensively during the past two decades.Next we briefly describe the motivations of this dissertation.With the advent of computing technology and numerical methods, the use of computer experiments is becoming more and more prevalent to simulate physical systems. Latin hypercube designs (LHDs)(Mckay, Conover and Beckman,1979), is one of the most commonly used design types in computer experiments (Santner, Williams and Notz,2003; Fang, Li and Sudjianto,2005). As the systems being stud-ied are becoming more and more complex, many improvements in the LHDs have been proposed in the literature to meet their specific requirements. Sliced Latin hypercube designs (SLHDs), first proposed by Qian (2012), is one of the variants of the LHDs. This design type is motivated by several new complex problems recent-ly arising in computer experiments such as running a computer model in batches, ensembles of multiple computer models, computer experiments with BQQV, cross-validation and data pooling. However, the presence of potentially high correlations among the columns of an SLHD makes the subsequent analysis intractable. There-fore, SLHDs with zero or low correlations among columns are called for.Computer experiments with BQQV is one of the most important applications of the SLHDs. For the modeling aspect, the Gaussian process (GP) model is the most commonly used (Santner, Williams and Notz,2003). The key step to build a GP model for a computer experiment with BQQV is to model the correlation structure of the qualitative variables (cf., Qian, Wu and Wu,2008; Zhou, Qian and Zhou,2011). However, the work of Han et al.(2010) and this dissertation reveals that when predicting a response at a given level-combination of the qualitative variables, not all responses at other level-combinations of the qualitative variables are useful. This is because some responses may weakly correlated so the information among them should not be used for modeling. How to filter out the useless information when building the GP models for computer experiments with BQQV deserves further studies.Traditionally, the GP model for computer experiments either uses a constant as the mean function or assumes some pre-specified variables in the mean function. Vast evidences have shown that they can be poor in terms of prediction when strong trends exist (cf., Joseph, Hung and Sudjianto,2008; Hung,2011). Identifying an appropriate mean function is crucial for building an accurate GP model in computer experiments. This is a relatively new topic and only a little work concerns this problem.SSDs are very useful in screening experiments due to their large column-to-run ratios. In the last20years, many construction methods for SSDs have been proposed. In contrast, data analysis methods of such designs still have a large space for development.What follows is the organization of this dissertation.Chapter1is the introduction, including some fundamental knowledge of the computer experiments, SSDs, and so on.Chapter2presents a construction method for sliced (nearly) orthog-onal LHDs. A construction method for sliced (nearly) orthogonal LHDs, which makes use of the existing orthogonal LHDs, is developed. The resulting designs have flexible sizes and most are new, thus excellent complement to those constructed by Yang et al.(2013a). With the orthogonality or near orthogonality being guaranteed, the space-filling property of the resulting designs is also improved.Chapter3studies computer experiments with BQQV. Most literature of computer experiments assumes that all the input variables are quantitative. Howev-er, in recent years researchers often encounter computer experiments with BQQV. In this chapter, a new design type, called optimal clustered-SLHD, is proposed. The proposed design is one kind of SLHD with points clustered in the design region, and possesses good uniformity within each slice. For computer experiments with BQQV, such designs help to capture the correlations between responses of different level-combinations in the qualitative variables. Furthermore, an adaptive analysis strategy intended for the proposed designs is developed. The proposed strategy al-lows us to automatically extract useful information from all auxiliary responses to increase the precision for the target response. The proposed designs, with the help of the proposed analysis strategy, are demonstrated to be effective via simulation examples. A real-life example from the food engineering literature is also studied to anticipate the improvements gained from using the proposed design and analysis strategy.Chapter4develops a variable selection method for the mean func-tion of the GP model in computer experiments. The proposed method is a Bayesian one and its basic idea is to introduce an indicator vector for all candidate variables for the mean function. The posterior distribution of such an indicator vec-tor contains the information relevant to variable selection, and its posterior samples can be conveniently generated by the Gibbs sampler. Then variable selection can be made based on these samples. A well-known practical example from the computer experiments literature is used to illustrate the implementation and consequently the performance of the proposed method. The superiority of the proposed method over the existing methods are demonstrated via the practical example and some simula-tion studies. It is shown that the proposed method compares very favorably with the existing methods and performs well in terms of several important measurements relevant to variable selection and prediction accuracy.Chapter5develops a Bayesian variable selection procedure for SSDs. The proposed strategy combines the advantages of the componentwise Gibbs sam-pler (Chen et al.,2011) and the functionally induced priors (Joseph and Delaney,2007), and is able to keep all possible models under consideration while requires relatively less time for parameter tuning. Analysis of three commonly used illus-trative experiments for SSDs shows that the proposed strategy identifies the same active effects as some existing methods did. Simulation studies show that the pro-posed strategy performs satisfactorily in terms of the true model identified rate, the smallest effect identified rate, the active effects identified rate, the inactive effects identified rate and the value of the model size.Chapter6concludes the work of this dissertation.
Keywords/Search Tags:and Phrases, Bayesian variable selection, Computer experimen-t, Correlation, Cross-validation, Effect sparsity, Gaussian process, Gibbs sampler, Kriging, Kronecker sum, Latin hypercube design, Main effect, Mixed-level, Orthog-onal contrast coding
PDF Full Text Request
Related items