BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121111T153000Z DTEND:20121112T000000Z LOCATION:355-E DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: This tutorial is suitable for attendees with an intermediate-level in parallel programing in MPI, and with some background in GPU programming in CUDA or OpenCL; it will provide a comprehensive overview on the optimization techniques to port, analyze, and accelerate applications on scalable heterogeneous computing systems using MPI and OpenCL, CUDA, and directive-based compilers using OpenACC. First, we will review our methodology and software environment for successfully identifying and selecting portions of applications to accelerate with a GPU, motivated with several application case studies. Second, we will present an overview of several performance and correctness tools, which provide performance measurement, profiling, and tracing information about applications running on these systems. Third, we will present a set of best practices for optimizing these applications: GPU and NUMA optimization techniques, optimizing interactions between MPI and GPU programming models. A hands-on session will be conducted on the NSF Keeneland System, after each part to give participants the opportunity to investigate techniques and performance optimizations on such a system. Existing tutorial codes and benchmark suites will be provided to facilitate individual discovery. Additionally, participants may bring and work on their own applications. SUMMARY:Scalable Heterogeneous Computing on GPU Clusters PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121111T153000Z DTEND:20121112T000000Z LOCATION:355-E DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: This tutorial is suitable for attendees with an intermediate-level in parallel programing in MPI, and with some background in GPU programming in CUDA or OpenCL; it will provide a comprehensive overview on the optimization techniques to port, analyze, and accelerate applications on scalable heterogeneous computing systems using MPI and OpenCL, CUDA, and directive-based compilers using OpenACC. First, we will review our methodology and software environment for successfully identifying and selecting portions of applications to accelerate with a GPU, motivated with several application case studies. Second, we will present an overview of several performance and correctness tools, which provide performance measurement, profiling, and tracing information about applications running on these systems. Third, we will present a set of best practices for optimizing these applications: GPU and NUMA optimization techniques, optimizing interactions between MPI and GPU programming models. A hands-on session will be conducted on the NSF Keeneland System, after each part to give participants the opportunity to investigate techniques and performance optimizations on such a system. Existing tutorial codes and benchmark suites will be provided to facilitate individual discovery. Additionally, participants may bring and work on their own applications. SUMMARY:Scalable Heterogeneous Computing on GPU Clusters PRIORITY:3 END:VEVENT END:VCALENDAR