| In recent years, the architecture of high-performance computer has evolved rapidly. The GPU-based heterogeneous system has been becoming one of the most popular designs. Compared with the homogeneous system, the heterogeneous system that integrated with CPU and GPU could provide better computing performance with lower power consumption. The heterogeneous systems are appropriate for data-parallel processing and are widely applied in industrial manufacturing, biomedical, geophysical exploration, weather forecast, and so on. However, the heterogeneous environment is still too tough for programming. How to realize the high efficient utilization of the multi-core CPU and many-core GPU in heterogeneous system is the hot and hard issue in heterogeneous computing.The brute-force attack on MD5 Crypt and Lared-P algorithm are two kind of typically data-parallel algorithm. The data parallel processing of former is independence while some complex data dependency exists in the later. In this paper, we focus on the collaborative computing of the two algorithms on heterogeneous system. The main work is described as follows:(1) The applying of a MPI-OpenMP-CUDA based method for collaborative computing on heterogeneous system: MPI is used for the collaboration of the homogeneous nodes; OpenMP is used for controlling and scheduling CPU and GPU; CUDA is used for GPU computing.(2) The parallelism of brute-force attack on MD5 Crypt on large-scale heterogeneous system. We firstly divide the whole keyspace into several sub-keyspaces of the same size, and evenly dispatch them to the nodes with MPI. For each node we use OpenMP threads to distribute the tasks across CPU and GPU. We also use CUDA threads to explore data parallelism on GPU and some optimizations are introduced for it. According to the problem of load imbalance, a two-way attack that the CPU starts from the head and the GPU start from the tail. For multi-process cracking on multi-nodes, we adopt a master-slaver mode so that the breakpoint of cracking could be correctly restored. The results show that the brute-force attack on MD5 Crypt could verify 43 thousands passwords per second when using CPU only, and 250 thousands when using CPU and GPU. If running the whole system, 1.8 billion passwords could be checked in a second, and therefore a new challenge is issued for the security of MD5 Crypt.(3) The parallelism of Lared-P on heterogeneous system. We firstly use MPI process to divide the simulation space into patches as the basic scheduling unit, and then we use OpenMP threads to dynamically distribute patches across CPU and GPU. To parallelize the current deposition on GPU, we put forward a SIMT based solution, which eliminates the write conflict by making the threads inside warp solve the current deposition in synchronism, and adopting the eight-color scheme to avoid the conflict of the threads between warps. To realize the parallelization of data initialization and the efficient utilize of GPU, the GPU is shared by two or more processes, so that the GPU processing could be asynchronously executed with the data initialization. For the data transmission between the host and device, the method of page-locked memory and minimum transmission is introduced to reduce the time consumption of the large-scale data transmission. The results show that the collaborative computing based final code achieved a speed-up of 29 times in double precision on one node compared with that on one core. |