| In this study, we identify and resolve several bottlenecks facing unstructured, adaptive, implicit finite element methods march towards petascale simulations. With those obstacles resolved, our method demonstrates its capabilities with strong scalability on large scale supercomputers and its ability to solve problems of interest requiring numerically intensive computations in a reasonable time frame. The performance of our implicit solver is improved by two algorithms developed in this work. The first algorithm, multiple compute-object based partition improvement, incrementally improves the load balance, hence the scalability of both the equation formation and the equation solution of the finite element analysis (FEA). The second algorithm, data reordering, enables the effective usage of the memory subsystem by increasing the data locality, so as to accelerate the per-core performance of the FEA.;We present excellent strong scaling for several applications performed on various supercomputers including IBM Blue Gene (BG/L and BG/P), Cray (XT3 and XT5) and Sun Constellation Cluster. The applications involve the flow simulations of a bifurcation pipe model with relatively small meshes and cardiovascular flow of an abdominal aorta aneurysm model with a much bigger mesh (more than 1 billion elements). The other application involves the blood flow in a "whole" body model composed of 78 arteries; from the neck to the toes. The effectiveness of our methodologies and the algorithms developed in this work are investigated in those applications. With the ability to solve real-world problems having complex geometry/physics in a realistic time, this work provides a reliable and efficient computation tool that can be used by researchers for design and development purpose. |