Font Size: a A A

High-Performance Parallel Optimization Algorithm Based On GPU

Posted on:2015-08-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:F LiFull Text:PDF
GTID:1228330467985942Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With high-performance parallel computing devices become increasingly popular, especially the rapid development of high-performance graphics processor (GPU),applications service solutions based on parallel GPU performance computing platforms caused great concern to domestic and foreign research scholars. Due to limitations of traditional CPU and computing cluster computing resources and energy consumption, as well as all types of science and engineering optimization problems for parallel computing demand continues to increase,those expansion model, simulation, optimization algorithms, numerical computing based on high-performance parallel computing has become a hot research topic in academia. High-performance parallel optimization algorithm is a key part to link up underlying parallel computing platform and the upper layer application services, the optimization of algorithm performance and the expansion of application space remains a challenge. Research about high-performance parallel computing application problems still exist a space to enhance. There is a strong need to improve parallel algorithms and numerical methods technology. Based on this, this dissertation will focus on high-performance parallel computing random number generation, intelligent optimization algorithms, numerical algorithms, such as research and innovation. Using a random number generator, theoretical knowledge ant colony algorithm, least-squares estimation problems, the use of GPU accelerated expansion ratio model, GPU local optimization, GPU parallel optimization techniques, design and made three key parallel optimization solutions.Main contributions of this dissertation are as follows:(1) for the the traditional random number generator slow speed and poor acceleration model scalability problem, we proposed a new GPU-based parallel random number generation scalable acceleration model after analyzed currently accelerating expansion models and the mechanism of random number generator. Based on the model we proposed a new and simple high-performance parallel computing random number generation algorithm (referred to as CUDA-RNG), the algorithm makes full use of the GPU collaborative computing capability could eventually generate high-efficiency random number sequence. Our experimental results show that this novel generator of RGN can achieve up to189.32×speedup over the sequential implementation with a small memory load overhead when using256threads per block.(2) for the ant colony algorithm has difficult to obtain the optimal solution in large scale optimization problem,wc focus on how to improve the ant colony algorithm performance and efficiency in GPU parallel computing environment inspired by the parallelism characteristics of ant colony algorithm in nature. By modeling the ant colony algorithm for TSP problem (Traveling Salesman Problem), we propose a new ant colony optimization algorithm based on CUDA (general purpose parallel computing architecture), referred to as GACO. The algorithm combines the common characteristics of MMAS and ACS to make mixed information matrix updatinig dynamically constructed adjacent to the shortest path and multi-channel distribution of ant colony optimization strategy. Finally, we analyze the GPU performance optimization program. Through make use of the optimization method, our algorithm can achieve a higher speed the better quality solution compared with the same type of algorithm. Tested by the traveling salesman problem (TSP) benchmark, our experimental results show a total speedup up to40.1x and35.7x over sequential implantation of ACS, and MMAS, respectively.(3) for the time consuming and memory space cost too much problem in singular value decomposition of least-squares problem in large-scale data, we proposed a iterative split and merge singular value decomposition least squares estimation method (IDMSVD) based on distributed architecture. By analyzing the advantages and disadvantages of existing singular value decomposition least squares estimation method, and the utilization of the GPU and kinds of distributed parallel computing architecture, this dissertation proposes an iterative split and merge singular value decomposition least squares estimation method, the algorithm can effectively improve computation time and memory space requirements in singular value decomposition of least-squares problems for large data. Finally, we demonstrate the effectiveness of the algorithm on the GPU kinds of distributed parallel computing devices.The calculation methods and procedures proposed in this dissertation have universal significance and can be easily transferred to other parallel computing equipment, such as multi-core CPU or large clusters equipment. More high-performance CPU and GPU accelerated platforms such as hybrid architecture (or GPU cluster), GPU and FPGA hybrid architecture will be expected to be applied to the study of high-performance parallel optimization algorithm.
Keywords/Search Tags:GPU Parallel Computing, Random Number Generator, Scalable Model, Ant Colony Optimization, Iterative Split and merge
PDF Full Text Request
Related items