Font Size: a A A

Study Of Statistical New Methods For Pooling Multi-site Datasets And Its Applications

Posted on:2021-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:X R LiuFull Text:PDF
GTID:2370330611480599Subject:Statistics
Abstract/Summary:PDF Full Text Request
Multi-site data pooling is an important method to solve many practical problems,and has been used to solve many research fields such as medicine and geography.The multi-site data fusion method originated in the 1960 s.It can integrate data from different sources,and then perform statistical analysis on the integrated data.Compared with a single data site model,a multi-site datasets model has more original information and is more excellent in data inference.The methods in the literature take into account the funding constraints in research in fields such as biomedicine.Most of them are the pooling of multi-site small datasets,which cannot adapt to many practical problems in the context of multi-site big data.On the other hand,there are relatively few studies on hypothesis testing problems of multi-site data fusion,and the existing methods are not robust enough under different sample sizes and variances of each node.Therefore,this dissertation will combine statistical machine learning algorithms to focus on solving the two aforementioned problems.With the improvement of data mining technology,the availability of data is getting higher and higher,the article considers the construction of multi-site big data pooling methods in combination with the Subsampling method.Considering the high computational cost and expensive storage cost faced by large-scale data analysis,based on methods such as uniform sampling and Leverage Score importance sampling,a multi-site big data subsampling fusion method is proposed.Through compared our method with the single-site inference method using the Monte Carlo method,and the superiority of the proposed method is verified.Secondly,this dissertation applies the parameter bootstrap test method to to the hypothetical test problem of multi-site data pooling.Monte Carlo simulation results show that the parameter bootstrap test is better than the test method proposed in reference [1] in terms of controlling the Type I error rates.It has a good test effect in the case of different sample scales and different variances of each site.
Keywords/Search Tags:Multi-site data pooling, big data, Subsampling, Bootstrap method
PDF Full Text Request
Related items