| Multimedia SIMD extensions are commonly employed today to speed up media processing. When vectorizing for SIMD architectures, one of the major issues is the handling of memory alignment. Prior re-search has focused on either vectorizing loops with all memory references being properly aligned, or introducing extra operations to deal with the misaligned memory references. On the other hand, multi-core SIMD ar-chitectures require coarse-grain parallelism, but most of the traditional applications are designed as single thread. So an important problem that, how to parallelize and vectorize loop nests with the awareness of data misalignments reduction, still remains unaddressed.This paper presents a framework for nested loop transformation, in-cluding dependence graph and offset-collection construction, analysis, transformation, and code generation. We propose a statement-migrating based loop transformation scheme that systematically maximizes the par-allelism of outermost loops, while the misaligned memory references in innermost loops are reduced. The core of our technique is to align each level of loops in the nest, considering the constraint of dependence rela-tions. To reduce the data misalignments, we established a mathematical model with a concept of offset-collection and proposed an effective heu-ristic algorithm. For coarser-grain parallelism, we proposed some rules to analyze the outermost loop. And when transformations are applied, the inner loops also are involved to maximize the parallelism. To avoid in-troducing more data misalignments, the involved innermost loop is han-dled distinguishingly from other levels of loops. Based on the above transformation results, we then introduce the SIMD code generation in detail. Experimental results indicate 7% to 37% (averagely 18.4%) mis-aligned memory references can be reduced. The simulations on CELL show that speedup as 1.1 can be reached by reducing the misaligned data, while 6.14 is reached by enhancing the parallelism for multi-core, with a total average speed up as 6.67. |