Research On Performance Optimization Of Large-Scale Matrix Factorization

Posted on:2020-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:J K Geng

Full Text:PDF

GTID:2428330626464593

Subject:Computer Science and Technology

Abstract/Summary:

As a fundamental machine learning technique,matrix factorization(MF)has been widely applied in practice.However,as the volume of training data grows to a large scale,MF cannot be completed on a single machine,thus necessitating distributed solutions to execute MF among multiple nodes efficientlyTraditional distributed solutions adopt SGD(Stochastic Gradient Descent)-based algorithms to run MF in an iterative way.However,the typical MapReduce or PS(parameter server)-based solutions fail to complete the training process efficiently.and suffer from significant communication cost and disk I/O overhead,which can prolong the makespan to a great extent and cause serious resource waste.Meanwhile,due to the existence of incast(many-to-one)communication pattern,such solutions constrain the deployment of RDMA in large scale and cannot harness the performance benefit of RDMA.Considering these drawbacks,this dissertation proposes a novel distributed solution to large-scale MF(named Rima),which abandons the centralized architecture and leverages decentralized ring-based parallelism,to achieve efficient and scalable MF.Rima adopts ring-based model parallelism to eliminate the centralized bottleneck and achieve high-performance training.Besides,Rima involves "one-step transforma-tion" strategy to reduce half communication workload and improve bandwidth efficiency Meanwhile,Rima uses three "partial randomness" strategies to improve the algorithm robustness and guarantee the convergence of our solution.Furthermore,Rima leverages"predefined pattern sequence" strategy to enable the node to read the required data in advance,so as to overlap disk I/O with computation/communication and avoid prolonging training time by heavy disk I/OThe experiments show that,compared with DSGD,Rima can reduce the training time by up to 68.7%under TCP transmission,and by up to 85.4%under RDMA transmission,which proves the outperformance of Rima over traditional solutions.

Keywords/Search Tags:

matrix factorization, RDMA, communication efficiency, training speed

Related items

1	Matrix Factorization In The Application Of Data Mining
2	Research And Application Of Matrix Factorization In Recommender Systems
3	A General RDMA Network Platfrom For Data Centers
4	The Study Of Nonnegative Matrix Factorization And Rough Set Theory
5	Research On Hybrid Recommendation Algorithms Based On Matrix Decomposition
6	Nonnegative Matrix Factorization Algorithm Based On The Regularized Method And Its Applications
7	Research On Manifold Embedding Matrix Factorization Algorithm
8	Research On Non-negative Matrix Factorization Algorithm
9	Design And Implementation Of GPU-accelerated RDMA Encrypted Communication Schem
10	Robust Low-rank Matrix Factorization And Its Application