Font Size: a A A

Research On Sequential Adaptive Variables And Subject Selection

Posted on:2021-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z M ChenFull Text:PDF
GTID:1367330602497389Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology and the continuous reduction of data collection costs,massive date collected from personal life or industry is growing at an exponential speed.This allows us to have more data than ever before for modeling and analysis,and it also benefits for statistical accuracy.Given the existence of modern computational/informational techniques,it is now easier to capture a long list of variables.However,from both theoretical and computational perspectives,when building a model with such big data,statisticians remain hard-pressed to efficiently decide the effective variables.In addition to variable determination issues,modern and automatically collected data sets—many of which are somewhat disorganized—pose another problem:many contain several redundant data points.This imposes a data analysis burden,especially when the number of such data points is large and only limited computational capacity is available.To balance statistical inference needs and the computational cost imposed by the data set size,this thesis looks to use a sufficient amount of data to build a statistically interpretable model with effective selected variables.The first part of this thesis discusses methods of subject selection in generalized linear model with multi-response(GLM).For a given level of precision in parameter estimation,this chapter builds up a framework of sequentially subject selection procedure for GLM under fixed or adaptive desgin respectivety.As for the estimation of GLM’s parameter,maximum quasi-likelihood estimation is adopted for the reason it relaxed the assumptions on the distribution of response,which enables the model to be more practical.In the end of this chapter,the proposed sequential procedure is proved to be,asymptotically,as efficient as the(unknown)fixed sample procedure in terms of the ratio of the sample sizes,and the coverage probability of the proposed sequential procedure will converge to the prescribed one.Based on the research of chapter 2,for a given level of precision in parameter estimation and for a GLM procedure,the following chapter proposes a method that sequentially finds the most informative data points from an existing data set,and simulta-neously select during estimation a high-impact subset of variables.This method allows us to retrench the sample size used without diminishing study quality.By exploiting the ability of sequential analysis to handle adaptive/random subject recruitment and using a statistical experimental design criterion,this method adaptively recruits new subjects into the analysis and concurrently mitigate the computational obstacles otherwise created by large sample sizes.At the end of this chapter,both simulated and real data are used to demonstrate the proposed methods for various models and under different estimation strategies.In the final chapter,the multiplicative regression model which is also known as the accelerate failure time model is considered for its wide application in the economic filed,finance,and survival analysis,especially for the data with positive response.For such data usually are collected from the adaptive design,the asymptotic analysis including consistency and normality of the multiplicative regression model with the adaptive design under the least product error criterion is studied at first in this chapter.Then,a smooth-threshold based variable selection method is studied,including Oracle property,consistency,and asymptotic normality.Combining with this variable selection method,a procedure that sequentially recruits new subjects,and simultaneously select a high-impact subset of variables is proposed,its asymptotic properties also are discussed.To reduce computational cost while in finding the most informative data points from an existing data set,a fast algorithm in which the time complexity is O(1)to adaptively find new subjects is proposed.Simulations for the performance of the variable section and subject section strategies are conducted in the end.
Keywords/Search Tags:Sequential Analysis, Generalized Linear Model, Quasi-likelihood Estimating Equations, Variables Selection, Subject Selection, Fixed Size Confidence Regions, Adaptive Variable Shrikage, Smooth-threshold, D-optimality Criterion
PDF Full Text Request
Related items