Font Size: a A A

Statistical Models And Methods For Analyzing High-dimensional Data With Complex Structure

Posted on:2021-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y K JiangFull Text:PDF
GTID:1480306542496604Subject:Statistics
Abstract/Summary:PDF Full Text Request
High-dimensional data with complex structure is an important type of data in statistical research.In recent years,with the rapid development of science and technology,high-dimensional data and complex structured data have played an increasingly important role in the research of different areas.At the same time,the computing power required to analyze high-dimensional data and complex structured data has been significantly improved.Therefore,the analysis of high-dimensional and complex structured data is of great significance,and it has gradually become a hot topic in statistical research.This dissertation studies the statistical models and methods of high-dimensional data and complex structured data.It mainly selects three types of problems,including the sufficient dimension reduction problem of high-dimensional data,the rank aggregation problem of complex structured rank data,and the hypothesis testing problem of data with mediation structure.There exist methods in the literature for these three types of problems,and there are still many aspects worthy of further research.For example,many methods for sufficient dimension reduction are not efficient in inferring the functional relationships of certain structures,and usually can only give point estimates of the dimension reduction subspace;many rank aggregation methods can not differentiate the quality differences between rankings;in the literature,there is no rigorous theoretical support for the role of total-effect test when establishing complementary mediation.Aiming at the above three statistical problems,this dissertation establishes new statistical models and methods.Specifically,the main contributions of this dissertation are summarized as follows:1.For the problem of sufficient dimension reduction,a new Bayesian model is developed.It models the joint distribution of the projected predictive variables and the response variable,and estimate the projection subspace and the joint distribution simultaneously.This dissertation proposes an efficient sampling algorithm for this model,and discusses how to make full posterior inference using the posterior samples.Furthermore,the applicability of the model and the accuracy of inference are better compared with existing methods.2.For the problem of rank aggregation,this dissertation establishes a framework that can differentiate the quality differences.The main feature of this model is that without external data,it can not only distinguish the quality differences among the rankers but also provide the detailed ranking information for relevant entities.This dissertation establishes two methods of inference,including the method of using Markov chain Monte Carlo sampling,and the iterative algorithm for maximum likelihood estimation.In addition,when there exists partial ranking data,the techniques to handle missing data are utilized to deal with it;3.For the mediation analysis,this dissertation proves explicitly that the total-effect test has to be significant whenever the direct and mediated effects bear the same sign and are both significant,as long as the least square estimation and F-test are used.For Sobel test,similar conclusions hold true when the sample size is large enough.These results support the growing consensus that the total-effect test should be abolished for establishing mediation.For each research topic,this dissertation uses simulation experiments to prove the effectiveness of the proposed methods.Real data analysis is used to explain how to apply the newly established statistical methods to actual scientific research.
Keywords/Search Tags:Sufficient dimension reduction, rank aggregation, mediation analysis, high-dimensional data, complex structured data
PDF Full Text Request
Related items