BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T001500Z DTEND:20121114T020000Z LOCATION:East Entrance DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Current and planned computer systems present challenges for scientific programming. Memory capacity and bandwidth are limiting performance as floating point capability increases due to more cores per processor and wider vector units. Effectively using hardware requires finding greater parallelism in programs while using relatively less memory. In this poster, we present how we tuned the Livermore Unstructured Lagrange Explicit Shock Hydrodynamics proxy application for on-node performance resulting in 62% fewer memory reads, a 19% smaller memory footprint, 770% more floating point operations vectorizing and less than 0.1% serial section runtime. Tests show serial runtime decreases of up to 57% and parallel runtime reductions of up to 75%. We are also applying these optimizations to GPUs and a subset of ALE3D, from which the proxy application was derived. So far we achieve up to a 1.9x speedup on GPUs, and a 13% runtime reduction in the application for the same problem. SUMMARY:Memory and Parallelism Exploration using the LULESH Proxy Application PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T001500Z DTEND:20121114T020000Z LOCATION:East Entrance DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Current and planned computer systems present challenges for scientific programming. Memory capacity and bandwidth are limiting performance as floating point capability increases due to more cores per processor and wider vector units. Effectively using hardware requires finding greater parallelism in programs while using relatively less memory. In this poster, we present how we tuned the Livermore Unstructured Lagrange Explicit Shock Hydrodynamics proxy application for on-node performance resulting in 62% fewer memory reads, a 19% smaller memory footprint, 770% more floating point operations vectorizing and less than 0.1% serial section runtime. Tests show serial runtime decreases of up to 57% and parallel runtime reductions of up to 75%. We are also applying these optimizations to GPUs and a subset of ALE3D, from which the proxy application was derived. So far we achieve up to a 1.9x speedup on GPUs, and a 13% runtime reduction in the application for the same problem. SUMMARY:Memory and Parallelism Exploration using the LULESH Proxy Application PRIORITY:3 END:VEVENT END:VCALENDAR