SCHEDULE: NOV 10-16, 2012
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
Programming Model Extensions for Resilience in Extreme Scale Computing
SESSION: Research Poster Reception
EVENT TYPE: Posters and Electronic Posters
TIME: 5:15PM - 7:00PM
SESSION CHAIR: Torsten Hoefler
AUTHOR(S):Saurabh Hukerikar, Pedro C. Diniz, Robert F. Lucas
ROOM:East Entrance
ABSTRACT:
System resilience is a key challenge to building extreme scale systems. A large number of HPC applications are inherently resilient, but application programmers lack mechanisms to convey their fault tolerance knowledge to the system. We present a cross-layer approach to resilience in which we propose a set of programming model extensions and develop a runtime inference framework that can reason about the context and significance of faults, as they occur, to the application programmer's fault tolerance expectations. We demonstrate using a set accelerated fault injection experiments the validity of our approach with a set of real scientific and engineering codes. Our experiments show that a cross-layer approach that explicitly engages the programmer in expressing fault tolerance knowledge which is then leveraged across the layers of system abstraction can significantly improve the dependability of long running HPC applications.
Chair/Author Details:
Torsten Hoefler (Chair) - ETH Zurich
Saurabh Hukerikar - University of Southern California
Pedro C. Diniz - University of Southern California
Robert F. Lucas - University of Southern California
Click here to download .ics calendar file
Click here to download .vcs calendar file
Click here to add event to your Google Calendar
Programming Model Extensions for Resilience in Extreme Scale Computing
SESSION: Research Poster Reception
EVENT TYPE:
TIME: 5:15PM - 7:00PM
SESSION CHAIR: Torsten Hoefler
AUTHOR(S):Saurabh Hukerikar, Pedro C. Diniz, Robert F. Lucas
ROOM:East Entrance
ABSTRACT:
System resilience is a key challenge to building extreme scale systems. A large number of HPC applications are inherently resilient, but application programmers lack mechanisms to convey their fault tolerance knowledge to the system. We present a cross-layer approach to resilience in which we propose a set of programming model extensions and develop a runtime inference framework that can reason about the context and significance of faults, as they occur, to the application programmer's fault tolerance expectations. We demonstrate using a set accelerated fault injection experiments the validity of our approach with a set of real scientific and engineering codes. Our experiments show that a cross-layer approach that explicitly engages the programmer in expressing fault tolerance knowledge which is then leveraged across the layers of system abstraction can significantly improve the dependability of long running HPC applications.
Chair/Author Details:
Torsten Hoefler (Chair) - ETH Zurich
Saurabh Hukerikar - University of Southern California
Pedro C. Diniz - University of Southern California
Robert F. Lucas - University of Southern California
Click here to download .ics calendar file