With the rapid development of computer industry and information technology,and social networks,online trading of emerging technologies and services,such as people gather and need to deal with more and more kinds of data,data scale is becoming more and more big.Big data research become a hot research field.Random sampling method has an important role in the study of large data.In this paper,the regression estimation of generalized linear models based on random sampling is studied.In this paper,the importance sampling improvement algorithm and optimal sampling design method of generalized linear model are proposed.Specific research methods and conclusions are mainly reflected in the following three aspects:In the first part,two methods of improving the sampling of leverage are proposed.One is to calculate the distance from the center point of the whole sample in the calculation of generalized leverage value.Extract the nearest L sample from the center of the sample as a pure subset and calculate the generalized hat matrix according to the pure subset.The other is the idea method of k-mean clustering,The generalized hat matrix was calculated according to the valid samples.In the second part,we mainly discuss the estimation problem under optimal sampling design of generalized linear model.The subsample estimator of general sampling is constructed,and the optimal sampling probability is obtained by minimizing the asymptotic mean square error of the estimator.Finally,the proposed method is discussed.The statistical properties of quantities.The results show that the proposed method has better performance in the complex data structure.In particular,the optimal sampling design method is superior to other methods. |