Distributed Large Scale Matrix Computing Algorithm Design

Posted on:2024-07-14

Degree:Master

Type:Thesis

Country:China

Candidate:T F He

Full Text:PDF

GTID:2568307091497004

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the increasing popularity of mobile Internet,various kinds of data such as voice,image,video,etc.show an explosive growth trend,which makes the processing of massive data an important challenge nowadays.In such a context,Hadoop and its ecosystem have become the de facto standard for big data processing,providing diverse applications and tools for the processing of massive data.At the same time,the processing of massive data also drives the rapid development of parallel computing for machine learning and data mining algorithms,among which matrix algorithms,as the basis of various machine learning algorithms,data mining algorithms and other algorithms,are particularly important for their parallel implementation.In this regard,Spark,as a new computing framework that makes up for the shortcomings of Map Reduce framework,is the best choice for implementing parallel matrix algorithms with its advantage of seamless integration with the Hadoop ecosystem.In this thesis,we build a small parallel matrix library based on the features of Spark,including different representations of distributed matrices,such as block-by-row and block-byblock,and implement parallel matrix multiplication operations and matrix Smith standard type computations.Among them,the parallel matrix multiplication operation is the core of other parallel matrix algorithms.In this thesis,we first implement a Spark-based parallel matrix computation algorithm,which can handle the multiplication operation of large-scale dense matrices,and analyze the bottleneck of the algorithm through experiments,and optimize the process of multiplying with the primary transformation matrix involved in Smith standard type computation,which greatly reduces the amount of data transmitted by the network and improves the performance of the algorithm.At present,the parallel computation algorithm of Smith standard type is only applicable to single machine operation and cannot handle largescale distributed matrices.Therefore,this thesis proposes the maximum convention number algorithm,which can support the Smith standard type computation algorithm for parallel computation of chunked matrices,and is implemented in the Spark computing framework,and this new algorithm can handle larger scale matrix operations.Finally,this thesis conducts relevant experiments on the algorithm to verify its correctness and scalability.

Keywords/Search Tags:

Matrix Computation Parallel Computing, Matrix Multiplication, Computation of Smith normal form, Spark platform

PDF Full Text Request

Related items

1	Spark Based Large Scaled Matrix Algorithms
2	Study And Implementation On Distributed Large Scale Matrix Computation Algorithms With Spark
3	The Architecture Of Matrix Large Scale Computation And Its Application To Doa Estimation And MIMO Receiver
4	Research On Methods For GPU Based Parallel Acceleration Of Matrix Computation
5	Secure Matrix Computation Based On Full Homomorphic Encryption And Its Applications
6	Parallel Algorithms And Architectures For Matrix Computations On FPGA
7	Research On SVM Accelerated Computation Based On Sparse Matrix Multiplication
8	The Research Of Matrix Multiplication Efficiency Based On MPI
9	Research Of Key Technologies On Privacy Protection For Secure Outsourcing Of Matrix Numerical Computation
10	Study On Secure Outsourcing Schemes Of Matrix Computing In Cloud Computing Environment