BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T001500Z DTEND:20121114T020000Z LOCATION:East Entrance DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: The amount of overhead that noise amplification causes can increase dramatically as we scale the application to a very large numbers of processes (10,000 or more). In prior work, we have introduced lightweight scheduling, which combines dynamic and static task scheduling to reduce the total number of dequeue operations while still absorbing noise on a node. In this work, we exploit a priori knowledge of per-process MPI slack to reduce the static fraction for those MPI processes that are known not to be on the critical path and thus likely not to amplify noise. This technique gives a 11% performance gain over the original lightweight scheduling (17% gain over static scheduling) when we run an AMG application on up to 16,384 process runs (1024 nodes) of a NUMA cluster, and are able to project further performance gains on machines with node counts beyond 10,000. (More details on poster in dynHybSummary.pdf) SUMMARY:Slack-Conscious Lightweight Loop Scheduling for Scaling Past the Noise Amplification Problem PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T001500Z DTEND:20121114T020000Z LOCATION:East Entrance DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: The amount of overhead that noise amplification causes can increase dramatically as we scale the application to a very large numbers of processes (10,000 or more). In prior work, we have introduced lightweight scheduling, which combines dynamic and static task scheduling to reduce the total number of dequeue operations while still absorbing noise on a node. In this work, we exploit a priori knowledge of per-process MPI slack to reduce the static fraction for those MPI processes that are known not to be on the critical path and thus likely not to amplify noise. This technique gives a 11% performance gain over the original lightweight scheduling (17% gain over static scheduling) when we run an AMG application on up to 16,384 process runs (1024 nodes) of a NUMA cluster, and are able to project further performance gains on machines with node counts beyond 10,000. (More details on poster in dynHybSummary.pdf) SUMMARY:Slack-Conscious Lightweight Loop Scheduling for Scaling Past the Noise Amplification Problem PRIORITY:3 END:VEVENT END:VCALENDAR