BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T210000Z DTEND:20121114T213000Z LOCATION:355-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Achieving good scaling for fine-grained communication intensive=0Aapplications on modern supercomputers remains challenging. In our=0Aprevious work, we have shown that such an application --- NAMD ---=0Ascales well on the full Jaguar XT5 without long-range interactions;=0AYet, with them, the speedup falters beyond 64K cores. Although the=0Anew Gemini interconnect on Cray XK6 has improved network performance,=0Athe challenges remain, and are likely to remain for other such=0Anetworks as well. We analyze communication=0Abottlenecks in NAMD and its CHARM++ runtime, using the Projections performance analysis tool.=0ABased on the analysis, we optimize the runtime, built=0Aon the uGNI library for Gemini.=0AWe present several techniques to improve the fine-grained=0Acommunication. Consequently, the performance of running 92224-atom=0AApoa1 on GPUs is improved by 36%. For 100-million-atom STMV, we=0Aimprove upon the prior Jaguar XT5 result of 26 ms/step to 13 ms/step=0Ausing 298,992 cores on Titan XK6. SUMMARY:Optimizing Fine-Grained Communication in a Biomolecular Simulation Application on Cray XK6 PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T210000Z DTEND:20121114T213000Z LOCATION:355-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Achieving good scaling for fine-grained communication intensive=0Aapplications on modern supercomputers remains challenging. In our=0Aprevious work, we have shown that such an application --- NAMD ---=0Ascales well on the full Jaguar XT5 without long-range interactions;=0AYet, with them, the speedup falters beyond 64K cores. Although the=0Anew Gemini interconnect on Cray XK6 has improved network performance,=0Athe challenges remain, and are likely to remain for other such=0Anetworks as well. We analyze communication=0Abottlenecks in NAMD and its CHARM++ runtime, using the Projections performance analysis tool.=0ABased on the analysis, we optimize the runtime, built=0Aon the uGNI library for Gemini.=0AWe present several techniques to improve the fine-grained=0Acommunication. Consequently, the performance of running 92224-atom=0AApoa1 on GPUs is improved by 36%. For 100-million-atom STMV, we=0Aimprove upon the prior Jaguar XT5 result of 26 ms/step to 13 ms/step=0Ausing 298,992 cores on Titan XK6. SUMMARY:Optimizing Fine-Grained Communication in a Biomolecular Simulation Application on Cray XK6 PRIORITY:3 END:VEVENT END:VCALENDAR