Font Size: a A A

Parallel Algorithm Study Of Lattice Boltzmann Method Based On GPU

Posted on:2022-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhuFull Text:PDF
GTID:2480306485486154Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Lattice Boltzmann method is a new method of fluid system modeling and simulation developed internationally for decades.This method has the advantages of both the microscopic molecular dynamics model and the macroscopic continuum model of fluid,and it is a mesoscopic model between the two.Due to the clear physical background and the characteristics of mesoscopic simulation,the lattice Boltzmann method has been widely used in fields that are difficult to simulate effectively by traditional methods such as micro-scale flow and heat transfer,porous media,biological fluids,magnetic fluids,and crystal growth.In addition,the lattice Boltzmann method has natural parallel characteristics,so it is particularly suitable for running on multi-core processors.NVIDIA GPU is built around a scalable multi-threaded streaming multiprocessors array,allowing the GPU architecture to span a wide range of markets by simply expanding the number of multi-processors and memory capacity.It launched CUDA(Compute Unified Device Architecture)in 2006,a general-purpose parallel computing platform and programming model,which solved the problem of how to transparently expand parallel application software to take advantage of the increasing number of processor cores.CUDA comes with a software environment that allows developers to program the GPU through supported programming languages such as CUDA C,CUDA Python,CUDA Java,etc.,like traditional programming.Many scholars have explored the efficient implementation of the lattice Boltzmann method on NVIDIA GPUs.However,these studies lack theoretical basis and data support,and most of them are based on experience to judge the performance bottleneck of the program.The GPU is a collection of complex components with a network structure of many-to-many connections.The optimization method should comprehensively consider the workload of each unit.This paper uses the latest performance analysis method proposed by NVIDIA GTC2019(GPU Technology Conference)combined with a new generation of performance analysis tools to analyze and optimize the performance of the algorithm from the perspective of GPU hardware.On the other hand,in a class of simulated flow fields with complex geometric structure,such as porous media,the fluid lattice usually only occupies a small part of the whole grid,and the lattices that do not participate in the evolution are randomly scattered in the flow field,which destroys the data locality accessed by the program.If the conventional simulation method is still used,it will lead to a large amount of waste of memory and extremely low operating efficiency.This paper proposes an efficient GPU scheme suitable for complex geometric simulation.After comparing with two typical schemes,the scheme in this paper has the best performance.The main work is as follows:(1)We firstly uses the Poiseuille flow in a three-dimensional cylinder to illustrate the use of performance analysis methods and tools.This example uses curve boundary conditions,and the missing distribution function at each boundary is processed by a single thread.Therefore,threads are organized into one-dimensional thread grids and one-dimensional thread blocks,and there is only one thread block in each thread grid.According to the Nsight Compute performance analysis tool and the peak performance percentage analysis method,the number of thread blocks in the thread grid is changed to increase the degree of parallelism,and the optimized performance improves by about 71%.After that,we used this set of tools and methods to analyze the performance bottleneck of the simulation based on the lattice Boltzmann method on the new Volta architecture.The first is the memory layout.We introduced the layout of the distribution function in detail and implemented the three layouts of AOS,SOA,and CSOA.The results proved that SOA has the best performance.Then,in order to further improve the memory throughput,we merged the two kernels,canceled the memory space allocated to the lattice attributes,and replaced them with registers.The optimized kernel performance increased by 20%.Finally,we compared the performance of the Push scheme(collision execute before the propagation)and Pull scheme(propagation execute before the collision),and used shared memory optimization for both schemes.The results show that the performance improvement of the Pull scheme is about 10% higher than that of the Push scheme.(2)For the flow field with complex geometric structure,we analyze the problems of its simulation on GPU and implement two typical solutions.The two schemes were optimized using GPU comprehensive optimization methods based on performance analysis methods,such as changing the storage level of commonly used data from global memory to registers and using SOA layout for lattice attributes.The performance of the optimized scheme improved by about8%.The shortcomings of the two solutions are pointed out: the indirect addressing scheme repeatedly stores the lattice coordinates and causes additional memory load,and the semi-direct addressing starts the full-matrix-scale thread,which reduces the operating efficiency.After that,we designed an efficient GPU solution for complex geometric simulation,using an addressing scheme with a cyclic pointer structure to locate the storage location of the lattice.Based on CUDA unified memory,the forward pointer is used to determine the memory address of the fluid lattice,and the reversed pointer is used to restore the coordinates of the fluid lattice in the original flow field.Aiming at the simulation of the natural convection of the anterior aqueous humor in three-dimensional human eyes with multiple lattice types,we carefully store various types of lattice data to better meet the requirement of coalesced access to global memory.Because the solution in this paper reduces the total number of load/store in the memory,it has the best performance.In summary,this paper uses NVIDIA's latest performance analysis methods and a new generation of performance analysis tools instead of empirical judgments to help us locate the performance bottleneck of the program from the perspective of GPU hardware,and provide data support for subsequent optimization.For flow fields with complex geometric structures,the efficient GPU solution based on the cyclic pointer addressing method proposed in this paper not only greatly reduces the memory usage,but also significantly improves the simulation efficiency.
Keywords/Search Tags:Lattice Boltzmann method, GPU, CUDA, complex geometry simulation
PDF Full Text Request
Related items