| As a fundamental machine learning technique,matrix factorization(MF)has been widely applied in practice.However,as the volume of training data grows to a large scale,MF cannot be completed on a single machine,thus necessitating distributed solutions to execute MF among multiple nodes efficientlyTraditional distributed solutions adopt SGD(Stochastic Gradient Descent)-based algorithms to run MF in an iterative way.However,the typical MapReduce or PS(parameter server)-based solutions fail to complete the training process efficiently.and suffer from significant communication cost and disk I/O overhead,which can prolong the makespan to a great extent and cause serious resource waste.Meanwhile,due to the existence of incast(many-to-one)communication pattern,such solutions constrain the deployment of RDMA in large scale and cannot harness the performance benefit of RDMA.Considering these drawbacks,this dissertation proposes a novel distributed solution to large-scale MF(named Rima),which abandons the centralized architecture and leverages decentralized ring-based parallelism,to achieve efficient and scalable MF.Rima adopts ring-based model parallelism to eliminate the centralized bottleneck and achieve high-performance training.Besides,Rima involves "one-step transforma-tion" strategy to reduce half communication workload and improve bandwidth efficiency Meanwhile,Rima uses three "partial randomness" strategies to improve the algorithm robustness and guarantee the convergence of our solution.Furthermore,Rima leverages"predefined pattern sequence" strategy to enable the node to read the required data in advance,so as to overlap disk I/O with computation/communication and avoid prolonging training time by heavy disk I/OThe experiments show that,compared with DSGD,Rima can reduce the training time by up to 68.7%under TCP transmission,and by up to 85.4%under RDMA transmission,which proves the outperformance of Rima over traditional solutions. |