Font Size: a A A

Statistical Inference Of Distributed Linear Support Vector Machine With Covariates Missing At Random

Posted on:2022-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y L BaiFull Text:PDF
GTID:2480306335454684Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Support vector machine is a commonly used statistical learning method for processing binary classification tasks.In actual problems,data may be missing due to various reasons that is a big question we mast to face when we analyze and model the data.Besides,with the rapid development of modern science and technology,data is increasing rapidly at all times.Data collection methods are also constantly changing.At present,distributed storage methods are commonly used to deal with these problems.And in some cases,due to the size of storage space,memory limitations,the cost of data transmission and even the lack of computing power,we can not use one computer to analyze and handle problems.Therefore,how to use support vector machines to modele and analyze on massive amounts of data which contains missing data in a distributed system is a problem that needs to be solved urgently.We first analyze the linear support vector machine with random missing covariates.Then we use the linear support vector machine based on the inverse probability weighting method to solve the above question.Then we extend this method under the distributed system to solve the statistical inference problem of distributed linear support vector machines with covariates missing at random.For the linear support vector machine with covariates missing at random,this article assumes that the missing mechanism model is a Logistic regression model.First,we analyze the linear support vector machine with covariates missing at random under centralized data.And it can be seen that the linear support vector machine is relatively robust to missing data compared with the traditional statistical models.The estimation may be affected only when some important data is missing.So using complete case method cannot always obtain good estimators.In order to sovle the problem,this article uses the inverse probability weighting method for the empirical risk function of the linear support vector machine,and analyze the advantages of this method.At the same time,noted that the inverse probability weighted linear support vector machine is a form of the weighted linear support vector machine.Finally,the numerical simulation can verify that in many cases the results obtained by linear support vector machines based on inverse probability weighting are better than linear support vector machines based on the complete case method.Because the empirical risk function of the weighted linear support vector machine is non-smooth,it cannot be directly processed by some distributed statistical inference algorithms that have been proposed.This paper uses a kernel function to smooth the empirical risk function.And then we find that the solution of the weighted linear support vector machine has the form of weighted least squares.Finally,the distributed statistical inference of the weighted support vector machine is obtained based on the weighted least squares method.In order to make the above process come true,the observation probability of each sample needs to be estimated in a distributed manner.For this reason,this paper uses the weighted least square method of the generalized linear model to directly study the distributed statistical inference of the generalized linear model.Finally,two numerical simulation studies verify the convergence of the algorithm and the validity of the estimation seperately.
Keywords/Search Tags:Linear support vector machine, Missing at random, Empirical risk function, Inverse probability weighted method, Distributed statistical inference
PDF Full Text Request
Related items