Font Size: a A A

Regularization Method And Application Of Lower Rank Matrix Decomposition

Posted on:2013-02-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:K K ZhaoFull Text:PDF
GTID:1100330395973488Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
The low rank matrix factorization problems arise in data mining and many other fields, including dimension reduction in data mining, Structure from Mo-tion, Motion segmentation problems and others in computer vision, and Collab-orative Filtering applications in customized recommendation system. The fun-damental problem of all these applications is how to solve the low rank matrix factorization problems. Many matrix factorization models have been proposed in last decades. In real applications, due to some reasons such as the missing data with high percentage, the matrix factorization models may not be well posed. Thus, the topics that how to regularize the base model to avoid the problem of over-fitting to the observed data and then proposing some useful algorithms is a challenge problem. The regularization technique is also an important way to in-corporate some additional data information in matrix factorization model. How-to improve the model with a regularization term will be useful in real application.Low-rank matrix factorization with missing data has become an effective technique for Collaborative Filtering application since it can generate high qual-ity predictions. However, the performance of low-rank factorization critically depends on how the low-rank model is regularized to mitigate the problem of over-fitting to the observed data. In some fields related to computer vision, the model of low-rank matrix factorization with missing data also may produce an optimal solution with less physical meaning. The previews works done by other researchers are mainly focus on finding more stable and efficiency algorithms for the existing matrix factorization models. They seldom discuss the problem of how to regularize the model to make it more effective and easy to solve. Beyond that, some researchers propose graph-regularized (nonnegative) matrix factor-ization models. The graph regularization try to keep the neighborhood relations for the original data points and improve the effectiveness of matrix factorization. However, their graph-regularized models make the optimization problem hard to solve, even in the case without missing data. In real applications, such as Collab- orative Filtering, the matrix is large scale and sparse in general. Therefore, the regularization technique we consider here must be simple and efficiency. Aiming at the problems mentioned above, this paper gives a detail discussion on the progress we have made about how to regularize the low rank matrix factorization model. The regularization techniques proposed here are highly related with the real applications and their effective will depend on the datasets. For the missing data problem, we give some theoretical analysis on the sensitivity of the basic factorization model. We consider the case with extremely large percentage of entries are unknown and the case with less percentage of missing data. For the representation of data with network information, we propose a new graph reg-ularized matrix factorization model. Meanwhile, we also propose some fast and efficient numerical algorithms. In some real application with large scale datasets, these new algorithms for regularized models show much better performance than previews ones. The contributions of this paper are as follows:(1) We give a theoretical analysis of the sensitivities of the original model for missing data in section2.2, and show that the basic model may have multiple solutions and moreover, two optimal solutions can be arbitrarily far from each others. Two kinds of sufficient conditions of this catastrophic phenomenon are given. In general case, we also give a low bound of error between an∈-optimal solution that is practically obtained in computation and a theoretically optimal solution.(2) We propose a novel regularization technique which we call inducible regular-ization for the low-rank matrix factorization with extremely large percent-age of entries are missing in section3.2. We develop two related methods to solve the new regularized problem. One is the inducible alternate least squares algorithm (IALS), which can make both the errors of observed data and errors of unknowns drops down simultaneously. The other is inducible stochastic gradient descent algorithm (ISGD), which give much high ac-curacy as the rank of approximation matrix increases. Both of the two algorithms give higher accuracy than the corresponding ones for original model. (3) We consider a constrained model on missing entries for the low-rank ma-trix factorization with about half entries are missing in section4.2, which is suitable for some applications related with computer vision and it can re-cover the missing data in high precision. We propose a successive alternate least squares algorithm (SALS) to solve the model with bounded entries constraints. It can minimize each one of the two factors fast in each step of alternative iterations and meanwhile, the effectiveness and convergence of algorithm are improved. Compared with the ALS and Newton algorithms for the basic model, SALS improve both the accuracy and the computing cost.(4) We propose a new graph-regularized low-rank matrix factorization to deal with the data that has network information in section5.1. This new regu-larization model has globally optimal and closed form solutions. A direct algorithm and an alternative iterative algorithm with inexact inner iter-ation are proposed to solve the new model for data with small or large number of points in column, respectively. A convergence analysis shows the global convergence of the iterative algorithm.(5) We give some synthetic examples and many real-world examples arising in collaborative filtering,3D reconstruction and data representation with network information. We compare our proposed algorithms for regularized model with the corresponding algorithms for the original model, and the experiment results confirm the efficiency of the regularization techniques for matrix factorization proposed in this paper.
Keywords/Search Tags:Low-rank matrix factorization, missing data, regularization tech-nique, alternate least squares algorithm, convergence analysis, collaborative fil-tering
PDF Full Text Request
Related items