BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121115T223000Z DTEND:20121115T230000Z LOCATION:255-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: The placement of tasks in a parallel application on specific nodes of a supercomputer can significantly impact performance. Traditionally, task mapping has focused on reducing the distance between communicating processes on the physical network. However, for applications that use collectives over sub-communicators, this strategy may not be optimal. Many collectives can benefit from an increase in bandwidth even at the cost of an increase in hop count, especially when sending large messages.=0A=0AWe have developed a tool, Rubik, that provides a simple API to create a wide variety of mappings for structured communication patterns. Rubik supports several operations that can be combined into a large number of unique patterns. Each mapping can be applied to disjoint groups of MPI processes involved in collectives to increase the effective bandwidth. We demonstrate the use of these techniques for improving performance of two parallel codes, pF3D and Qbox, which use collectives over sub-communicators. SUMMARY:Mapping Applications with Collectives over Sub-Communicators on Torus Networks PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121115T223000Z DTEND:20121115T230000Z LOCATION:255-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: The placement of tasks in a parallel application on specific nodes of a supercomputer can significantly impact performance. Traditionally, task mapping has focused on reducing the distance between communicating processes on the physical network. However, for applications that use collectives over sub-communicators, this strategy may not be optimal. Many collectives can benefit from an increase in bandwidth even at the cost of an increase in hop count, especially when sending large messages.=0A=0AWe have developed a tool, Rubik, that provides a simple API to create a wide variety of mappings for structured communication patterns. Rubik supports several operations that can be combined into a large number of unique patterns. Each mapping can be applied to disjoint groups of MPI processes involved in collectives to increase the effective bandwidth. We demonstrate the use of these techniques for improving performance of two parallel codes, pF3D and Qbox, which use collectives over sub-communicators. SUMMARY:Mapping Applications with Collectives over Sub-Communicators on Torus Networks PRIORITY:3 END:VEVENT END:VCALENDAR