Font Size: a A A

Specification Testing Of Parameter Regression Model For Massive Data Based On Byzantine Failures

Posted on:2024-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:J C ZhangFull Text:PDF
GTID:2530307067491454Subject:Statistics
Abstract/Summary:PDF Full Text Request
Specification testing is one of the most important questions in statistics.In the process of the development of statistics,for small and medium-sized data sets,many effective testing methods have been proposed,however,in the big data era,many methods will face the problem of calculation and storage.Meanwhile,massive data sets are often collected from multiple sources by different experimental,in such cases,simply judging whether the entire data set meets a certain model can sometimes no longer meet the needs.For the parametric specification testing,it is important to give a test statistic and its distribution,and for the case where there are many subsets,a robust parameter estimate is critical.In this paper,by introducing the methods of solving Byzantine failures into the specification testing,a parametric specification testing method based on Byzantine failures is proposed for massive data sets,and the test carried out in this paper is multiple testing,which can find out the heterogeneous sub-datasets.In the specification testing for small and medium-sized data,classical test statistics are combined with nonparametric method and conditional moment estimation method,the computational complexity reaches O(N2).In distributed learning or federated learning under Byzantine failures,the Median-of-means(MOM)method has a wide range of applications because of its simplicity and effectiveness,and some other MOM-based methods have also improved the robustness of learning in recent years.In this paper,the MOM method is applied to the parameter estimation of the specification testing for massive dataset,and the test statistics are constructed on each sub-dataset based on a robust estimator,the asymptotic distribution of the test statistics is proved.Assuming that there are K subdatasets,the computational complexity of the test statistics proposed in this paper is only O(N/K)2,which greatly reduces the computational cost.In addition,this paper uses the BH method to perform multiple tests,and finds the heterogeneous sub-datasets by controlling the false discovery rate.Finally,in the numerical simulation,the proposed testing scheme has good performance.
Keywords/Search Tags:Massive Data, Specification Testing, Byzantine Failures, Multiple Testing, Parametric Regression Model
PDF Full Text Request
Related items