In W. Cheng and A. S. M. Sajeev, editors, Proceedings of 6th Annual Australasian Conference on Parallel And Real-Time Systems (PART '99) , Springer-Verlag, 1999.
Abstract
Research on the high-performance implementation of nested data parallelism
has, over time, covered a wide range of architectures. Scalar and vector
processors as well as shared-memory and distributed memory machines were
targeted. We are currently investigating methods to integrate this
technology into a single portable compiler back-end. Essential to our
approach are two program transformations, flattening and calculational
fusion, which even out irregular parallelism and increase locality of
reference, respectively. We generate C code that makes use of a portable,
light-weight, collective-communication library. First experiments on scalar,
vector, and distributed-memory machines support the feasibility of the
approach.
PostScript version (14 pages).
This page is part of Manuel Chakravarty's WWW-stuff.