BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121115T173000Z DTEND:20121115T180000Z LOCATION:355-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Applications often have a sequence of parallel =0Aoperations to be offloaded to graphics processors; each operation can=0Abecome an individual GPU kernel. Developers typically explore=0Adifferent transformations for each kernel. =0AIt is wellknown that efficient data management is critical in=0Aachieving high GPU performance and that fusing multiple=0Akernels into one may greatly improve data locality. Doing so,=0Ahowever, requires transformations across multiple, potentially=0Anested, parallel loops; at the same time, the original code=0Asemantics must be preserved. Since each=0Akernel may have distinct data access patterns, their combined=0Adataflow can be nontrivial. As a result, the complexity of =0Amulti-kernel transformations often leads to significant effort with no=0Aguarantee of performance benefits.=0A=0AThis paper proposes a dataflow-driven analytical framework to=0Aproject GPU performance for a sequence of parallel operations=0Awithout implementing GPU code or using physical hardware.=0AThe framework also suggests multi-kernel transformations that=0Acan achieve the projected performance. SUMMARY:Dataflow-Driven GPU Performance Projection for Multi-Kernel Transformations PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121115T173000Z DTEND:20121115T180000Z LOCATION:355-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Applications often have a sequence of parallel =0Aoperations to be offloaded to graphics processors; each operation can=0Abecome an individual GPU kernel. Developers typically explore=0Adifferent transformations for each kernel. =0AIt is wellknown that efficient data management is critical in=0Aachieving high GPU performance and that fusing multiple=0Akernels into one may greatly improve data locality. Doing so,=0Ahowever, requires transformations across multiple, potentially=0Anested, parallel loops; at the same time, the original code=0Asemantics must be preserved. Since each=0Akernel may have distinct data access patterns, their combined=0Adataflow can be nontrivial. As a result, the complexity of =0Amulti-kernel transformations often leads to significant effort with no=0Aguarantee of performance benefits.=0A=0AThis paper proposes a dataflow-driven analytical framework to=0Aproject GPU performance for a sequence of parallel operations=0Awithout implementing GPU code or using physical hardware.=0AThe framework also suggests multi-kernel transformations that=0Acan achieve the projected performance. SUMMARY:Dataflow-Driven GPU Performance Projection for Multi-Kernel Transformations PRIORITY:3 END:VEVENT END:VCALENDAR