BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121115T203000Z DTEND:20121115T210000Z LOCATION:255-BC DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: The NAS Parallel Benchmarks (NPB) are a well-known suite of benchmarks that proxy scientific computing applications. They specify several problem sizes that represent how such applications may run on different sizes of HPC systems. However, even the largest problem (Class F) is still far too small to exercise properly a Petascale supercomputer. Our work shows how one may scale the Block Tridiagonal (BT) NPB from todays size to Petascale and Exascale computing systems. In this paper we discuss the pros and cons of various ways of scaling. We discuss how scaling BT would impact computation, memory access and communications, and highlight the expected bottleneck, which turns out to be not memory or communication bandwidth, but latency. Two complementary ways are presented to overcome latency obstacles. We also describe a practical method to gather approximate performance data for BT at exascale on actual hardware, without requiring an exascale system. SUMMARY:Extending the BT NAS Parallel Benchmark to Exascale Computing PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121115T203000Z DTEND:20121115T210000Z LOCATION:255-BC DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: The NAS Parallel Benchmarks (NPB) are a well-known suite of benchmarks that proxy scientific computing applications. They specify several problem sizes that represent how such applications may run on different sizes of HPC systems. However, even the largest problem (Class F) is still far too small to exercise properly a Petascale supercomputer. Our work shows how one may scale the Block Tridiagonal (BT) NPB from todays size to Petascale and Exascale computing systems. In this paper we discuss the pros and cons of various ways of scaling. We discuss how scaling BT would impact computation, memory access and communications, and highlight the expected bottleneck, which turns out to be not memory or communication bandwidth, but latency. Two complementary ways are presented to overcome latency obstacles. We also describe a practical method to gather approximate performance data for BT at exascale on actual hardware, without requiring an exascale system. SUMMARY:Extending the BT NAS Parallel Benchmark to Exascale Computing PRIORITY:3 END:VEVENT END:VCALENDAR