| Solving large sparse linear systems fast lies in the heart of many numerical simulation problems in scientific and engineering computing. Direct methods and iterative methods are the two main solving methods. The former have stable calculation process, measurable computation and high precision, while the later need less storage and time. In practical applications, preconditioning technologies are used to improve the coefficient matrix, so as to reduce the storage cost in direct methods and the divergence risk in iteration methods. The transformed coefficient matrix is of denser spectrum distribution, less fill-ins or block forms, etc. These characters are helpful for the improvement of stability, the reduction of storage, or the exploitation of parallelism. Thus the parallel direct and iteration method can develop along with the progress in preconditioning studies.The main contributions of the thesis are as follows:1. In solving linear systems with typical structures, based on the PIE (parallel implicit elimination) process (corresponding to WZ decomposition), this thesis extends the WZ factorization of the symmetric tridiagonal matrix to both the general tridiagonal matrix and the symmetric p-tridiagonal matrix. For the general tridiagonal matrix, the method removes the necessity to be symmetry of the WZ factorization. And, for the symmetric p-tridiagonal matrix, the bandwidth of the three diagonal is expanded to any positive integer p. Furthermore, based on the two WZ factorizations mentioned above, this thesis presents and implements two parallel algorithms, called PTri algorithm and PpSTri algorithm, to solve their corresponding linear systems. These methods follow the principle of divide-and-conquer, the overlapping between computation and communication, and the segmentation of the coefficient matrix. Theoretical analysis and numerical experiments show that, compared to other similar algorithms, our parallel algorithms can balance the workload and reduce waiting time among processors effectively. Especially, the PTri algorithm achieved an efficiency of 22% higher over the LUO algorithm[22]. And, when p=1, the PpSTri algorithm is 7.8% faster than the RAO algorithm[35].2. In solving linear systems with general structures, the traditional Krylov subspace methods have difficulties in improving the parallel efficiency due to the high cost of synchroning. So, based on the BiCR algorithm, this thesis proposes the s-BiCR algorithm to reduce the numbers of synchronous communication and memory access. The parallel s-BiCR algorithm is then implemented by employment of the distributed matrix-vector multiplication. Theoretical analysis and numerical experiments show that s-BiCR owns preferable features of parallelism and data locality. In the parallel implementation, extensive experimental results indicate that the speed and precision of the s-BiCR algorithm are higher than the similar algorithm s-BiCG[2].3. For the preconditioning techniques for linear systems, this thesis studies the minimum degree ordering algorithm for the nonsymmetric matrix to reduce fill-ins of the elimination process. Traditional ordering algorithms can not reflect the changes of fill-ins when appling to the nonsymmetric matrix. To address this problem, the equivalent expression of the fill-ins of the elimination process is presented in the original bipartite graph. The proposed method makes use of several new concepts: the eliminated set, the reachable set, the condition Q of paths and the eliminated key edge set and the two specific classes of paths, i.e. the eliminated key edge path and the pivot path. Furthermore, the variation characterization of matrix nonzero structure is studied. And then, this thesis put forwards a method to compute the reachable sets of nonsymmetric matrices and verify that the length of the search path is no more than three. Next, this thesis finds the different uneliminated vertices that have the same adjacencis and then combine these vertices, and also discusses the methods to combine the eliminated vertices by classfying the relationships among them. On the basis of the theory discussed above, an approximate minimum degree ordering algorithm is specifically designed for the sparse, nonsymmetric structure of matrices, which is called NonAMD. Experimental results show that, among similar methods, NonAMD achieves the lowest growth rate of non-zero elements in the LU factorization of the transformed matrices, which proves that the study of minimum degree ordering algorithm direct to the nonsymmetric structure of matrices is valid to reduce fill-ins. |