Font Size: a A A

Study On The Approximation Of Wasserstein Distance In Optimal Transport Model And Unequal Dimensional Transport Problem

Posted on:2022-07-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J LinFull Text:PDF
GTID:1480306569471144Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The optimal transport(OT)problem was proposed by French mathematician Gaspard Monge.The model aims at pushing forward the source probability mass to the target probability distribution with minimal cost,and the Wasserstein distance in the model not only defines the distance between probability distributions,but also has an intuitive geometric characteristic for the space of probability measures.In addition,even when the supports of the measures have no overlap,the Wasserstein distance can still catch the meaningful geometric features,which makes it better than Kullback-Leibler divergence in this respect.However,due to the large computational cost for the Wasserstein distance,it is rarely applied in the field of data science.In 2013,French mathematician Cuturi proposed the entropy regularization technique to the original model,and used the Sinkhorn iterative algorithm to approximate the solution of the OT model with very low computational cost,which greatly improved the calculation of the OT model,and made the OT model regain its attention.Subsequently,Wasserstein distance has been successfully applied in machine learning,statistical analysis,computer graphics and other related fields.However,there are still many problems in the calculation and application of the OT model.The OT model is used to character the Generative Adversary Networks(GAN).When the GAN model is chartered by OT,the neural network parameterizes the mapping from the low dimensional feature space to the high dimensional feature space,and thus the model is actually an unequal dimensional OT problem.If we neglect this problem,the training of the model and the accuracy of the computational results will be affected.Besides,in the application of OT model,it is necessary to calculate the Wasserstein distance between two measures,and we in practice use empirical measures to approximate the Wasserstein distance.However,when calculating the large scale OT loss with high-dimensional data,the sample complexity increases exponentially with the dimension,and the convergence speed is slow and the stability of algorithm is affected due to the influence of dimension disaster and sample noise.Therefore,it is of great significance and application value to study the estimation of Wasserstein distance and unequal dimensional OT problem.To solve these problems,this paper proposes the following solutions based on projection and coupling decomposition techniques.(1)In order to solve the problem of unequal dimensional OT problem when GAN is described by the OT model,we proposed a multi-projection method to project simultaneously the source distribution and the target distribution into multiple low dimensional subspaces simultaneously and construct corresponding OT model to obtain the transport mappings pushing forward the projected source distribution to the projected target distribution in these subspaces.We finally derive that the original unequal dimensional OT problem can be equivalently changed into a multi-marginal OT problem,prove the existence and uniqueness of the solution for this problem,and give the explicit expression of the solution.Subsequently,we use parameter method to approximate the objective function of the multi marginal OT problem,and the differentiability and statistical convergence properties are studied to ensure that the problem can be solved by the first order method.(2)To resolve the influence of dimension disaster and sample noise in Wasserstein distance estimation in the case of high-dimensional distribution,this paper impose a low dimensional manifold structure to the original OT model,and decompose the coupling,viz.,force rank regularization.We call the improved model as the extended spiked model.Under this model,the coupling is a weighted sum of several product measures.Under this condition,source and target measures both have a partition,viz.,the source and target measures is written as a weighted sum of measures,which makes the model robust to the sampling noise by using empirical measures in the computation.In addition,the source and target measures of this model are projected simultaneously into the low dimensional subspace can reduce influence of dimension disaster to the model.For this model,we study the upper bound of the approximation error of the OT loss,and give the convergence result of the empirical Wasserstein distance and the corresponding statistical properties(3)The entropy regularized OT model can be transformed into a projection problem with respect to Kullback Leibler divergence,and this problem can linearly converge to the unique global optimal solution by domain decomposition algorithm.In this paper,we study the stability and related statistical properties of this algorithm when it uses minibatch sample to learn Wasserstein distance on small decomposition cell.By studying the dual problem of entropy regularization OT problem on small cells,this paper proves that in the domain decomposition algorithm,the quality of gradient estimation can be guaranteed and the high-precision solution can be obtained by optimizing with mini-batch data.In addition,we provide the concentration inequality of Wasserstein distance on the decomposition cell,and the results show that the number and size of small batch samples will affect the convergence result.
Keywords/Search Tags:Optimal transport model, Wasserstein distance, Approximation algorithm, Extended spiked model, Stability, Convergence
PDF Full Text Request
Related items