BEGIN:VCALENDAR
PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN
VERSION:1.0
BEGIN:VEVENT
DTSTART:20121114T180000Z
DTEND:20121114T183000Z
LOCATION:255-EF
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Most stencil computations allow tile-wise concurrent start, i.e., there=0Aalways exists a face of the iteration space and a set of tiling=0Ahyperplanes such that all tiles along that face can be started=0Aconcurrently. This provides load balance and maximizes parallelism.=0AHowever, existing automatic tiling frameworks often choose hyperplanes=0Athat lead to pipelined start-up and load imbalance. We address this =0Aissue with a new tiling technique that ensures concurrent start-up as =0Awell as perfect load-balance whenever possible. We first provide =0Anecessary and sufficient conditions on tiling hyperplanes to enable =0Aconcurrent start for programs with affine data accesses. We then provide =0Aan approach to find such hyperplanes. Experimental evaluation on a =0A12-core Intel Westmere shows that our code is able to outperform a tuned =0Adomain-specific stencil code generator by 4 to 20 percent, and previous =0Acompiler techniques by a factor of 2x to 10.14x.
SUMMARY:Tiling Stencil Computations to Maximize Parallelism
PRIORITY:3
END:VEVENT
END:VCALENDAR
BEGIN:VCALENDAR
PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN
VERSION:1.0
BEGIN:VEVENT
DTSTART:20121114T180000Z
DTEND:20121114T183000Z
LOCATION:255-EF
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Most stencil computations allow tile-wise concurrent start, i.e., there=0Aalways exists a face of the iteration space and a set of tiling=0Ahyperplanes such that all tiles along that face can be started=0Aconcurrently. This provides load balance and maximizes parallelism.=0AHowever, existing automatic tiling frameworks often choose hyperplanes=0Athat lead to pipelined start-up and load imbalance. We address this =0Aissue with a new tiling technique that ensures concurrent start-up as =0Awell as perfect load-balance whenever possible. We first provide =0Anecessary and sufficient conditions on tiling hyperplanes to enable =0Aconcurrent start for programs with affine data accesses. We then provide =0Aan approach to find such hyperplanes. Experimental evaluation on a =0A12-core Intel Westmere shows that our code is able to outperform a tuned =0Adomain-specific stencil code generator by 4 to 20 percent, and previous =0Acompiler techniques by a factor of 2x to 10.14x.
SUMMARY:Tiling Stencil Computations to Maximize Parallelism
PRIORITY:3
END:VEVENT
END:VCALENDAR