Font Size: a A A

Projective Ensemble Two Sample Test For Equality Of Distributions In High Dimensions

Posted on:2023-08-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z M LiFull Text:PDF
GTID:1520307307490474Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of science and information,more and more high-dimensional data appear in various fields,including finance,biology,management and statistics.In genomics,for example,the genetic sequence of a microorganism often reaches thousands of dimensions or more.Mining information in high-dimensional data not only brings opportunities for the development of science,but also brings great challenges for researchers.As an important technique in data mining,two-sample test has been paid more and more attention by scien-tists.Two-sample distribution test is not only one of the most basic problems in statistics,but also of great significance in practical application.For example,it is of great significance for enterprises to study the demand distribution before mak-ing production plans to adjust production operation and inventory management,and the two-sample test can not only approximate the demand distribution,but also test the difference of demand distribution in different periods.As all we know,student’stest and Hotelling’s2test are famous to test the two sample distri-butions.While at present,the popular methods are non-parametric tests,such as energy statistic,the projection-averaging based Cramér-von Mises statistic and maximum mean discrepancy(MMD)test.These methods only perform well un-der certain alternatives,and even suffer significant power loss in high dimension.Based on the above discussion,we propose a new robust two sample test based on projective ensemble approach,and use an adaptive algorithm to optimize it to improve the power performance of the test in high dimension.The specific research contents and main contributions are as follows:1.To solve the defect of existing methods,we propose a robust two sample test based on projective ensemble approach.This method combines the advan-tages of energy statistics and the projection-averaging based Cramér-von Mises statistics.The proposed test statistic has a simple closed-form expression without any tuning parameters involved,it is easy to implement and can be computed in quadratic time.Moreover,our test is insensitive to the dimension and consistent against all fixed alternatives,it does not require the moment assumption and is robust to the presence of outliers.Build upon the empirical process,we proved that the test statistic has a weighted2distribution under0,with unknown weights that hinge upon the distribution of the data.Thus a necessary procedure to approximate the limiting null distribution is to implement random permuta-tions.However,the permutations will substantially increase the computation costs for even one single test when the sample sizes are large.However,based on the advantages of projection method,we can use random projection to further reduce the computational cost.Extensive numerical studies indicate that the pro-posed projection ensemble based test is superior to most existing tests,especially in the presence of the heavy-tailed distributions.Moreover,it is comparable with the projection-averaging based Cramér-von Mises test in terms of power perfor-mance,but much more efficient in terms of computation.In the empirical study,the projection ensemble based test is applied to inspect whether the demand on Friday is significantly different from other weekdays.Based on the test results,logistics companies can conduct scheduling and allocation in advance.2.We further study the asymptotic properties of projective ensemble test in high dimension.The results show that if we use-statistic theory to com-pute the test statistic,the limiting null distribution becomes standard normal,which greatly reduces the computational cost of implementing the test because no permutation is required to determine the critical values.we then give a general condition so that the alternative can be detected with probability approaching1.However,this condition is proved to be difficult to satisfy in the case of high dimension,especially when the difference is sparse,and the power of the test drops polynomially as the dimension increases.A large number of simulation results and empirical studies show that projective ensemble test still perform well in the specific model in the case of high dimensionality,but it is also proved that the power of the projection ensemble based test decreases significantly with the increase of dimensionality in some models.3.To enhance the power in high dimensions,we propose a new method that replaces the constant weight function with a data adaptive version that shrinks the weight of the unimportant covariates towards zero.The shrinkage is achieved by incorporating a penalty when optimizing the test statistics over a class of weight functions.We show that,the newly proposed procedure greatly improves the power because it effectively reduces the data dimension.Finally,numerical simulation results show that the proposed projective test with adaptive weights maintain high power against various alternatives.Real data are used to prove the superiority of the proposed method.Meanwhile,the proposed projective test with adaptive weights still perform well even we add some white noises to real data.
Keywords/Search Tags:two sample test, projection, high-dimension, dimension reduction, nonparametric
PDF Full Text Request
Related items