BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T180000Z DTEND:20121114T183000Z LOCATION:255-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Most stencil computations allow tile-wise concurrent start, i.e., there=0Aalways exists a face of the iteration space and a set of tiling=0Ahyperplanes such that all tiles along that face can be started=0Aconcurrently. This provides load balance and maximizes parallelism.=0AHowever, existing automatic tiling frameworks often choose hyperplanes=0Athat lead to pipelined start-up and load imbalance. We address this =0Aissue with a new tiling technique that ensures concurrent start-up as =0Awell as perfect load-balance whenever possible. We first provide =0Anecessary and sufficient conditions on tiling hyperplanes to enable =0Aconcurrent start for programs with affine data accesses. We then provide =0Aan approach to find such hyperplanes. Experimental evaluation on a =0A12-core Intel Westmere shows that our code is able to outperform a tuned =0Adomain-specific stencil code generator by 4 to 20 percent, and previous =0Acompiler techniques by a factor of 2x to 10.14x. SUMMARY:Tiling Stencil Computations to Maximize Parallelism PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T180000Z DTEND:20121114T183000Z LOCATION:255-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Most stencil computations allow tile-wise concurrent start, i.e., there=0Aalways exists a face of the iteration space and a set of tiling=0Ahyperplanes such that all tiles along that face can be started=0Aconcurrently. This provides load balance and maximizes parallelism.=0AHowever, existing automatic tiling frameworks often choose hyperplanes=0Athat lead to pipelined start-up and load imbalance. We address this =0Aissue with a new tiling technique that ensures concurrent start-up as =0Awell as perfect load-balance whenever possible. We first provide =0Anecessary and sufficient conditions on tiling hyperplanes to enable =0Aconcurrent start for programs with affine data accesses. We then provide =0Aan approach to find such hyperplanes. Experimental evaluation on a =0A12-core Intel Westmere shows that our code is able to outperform a tuned =0Adomain-specific stencil code generator by 4 to 20 percent, and previous =0Acompiler techniques by a factor of 2x to 10.14x. SUMMARY:Tiling Stencil Computations to Maximize Parallelism PRIORITY:3 END:VEVENT END:VCALENDAR