SC12 Home > SC12 Schedule > SC12 Presentation - Alleviating Scalability Issues of Checkpointing Protocols

SCHEDULE: NOV 10-16, 2012

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Alleviating Scalability Issues of Checkpointing Protocols

SESSION: Checkpointing

EVENT TYPE: Papers

TIME: 2:00PM - 2:30PM

SESSION CHAIR: Frank Mueller

AUTHOR(S):Rolf Riesen, Kurt Ferreira, Dilma Da Silva, Pierre Lemarinier, Dorian Arnold, Patrick G. Bridges

ROOM:255-EF

ABSTRACT:
Current fault tolerance protocols are not sufficiently scalable for the exascale era. The most-widely used method, coordinated checkpointing, places enormous demands on the I/O subsystem and imposes frequent synchronizations. Uncoordinated protocols use message logging which introduces message rate limitations or undesired memory and storage requirements to hold payload and event logs. In this paper we propose a combination of several techniques, namely coordinated checkpointing, optimistic message logging, and a protocol that glues them together. This combination eliminates some of the drawbacks of each individual approach and proves to be an alternative for many types of exascale applications. We evaluate performance and scaling characteristics of this combination using simulation and a partial implementation. While not a universal solution, the combined protocol is suitable for a large range of existing and future applications that use coordinated checkpointing and enhances their scalability.

Chair/Author Details:

Frank Mueller (Chair) - North Carolina State University

Rolf Riesen - IBM

Kurt Ferreira - Sandia National Laboratories

Dilma Da Silva - IBM

Pierre Lemarinier - IBM

Dorian Arnold - University of New Mexico

Patrick G. Bridges - University of New Mexico

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

Alleviating Scalability Issues of Checkpointing Protocols

SESSION: Checkpointing

EVENT TYPE:

TIME: 2:00PM - 2:30PM

SESSION CHAIR: Frank Mueller

AUTHOR(S):Rolf Riesen, Kurt Ferreira, Dilma Da Silva, Pierre Lemarinier, Dorian Arnold, Patrick G. Bridges

ROOM:255-EF

ABSTRACT:
Current fault tolerance protocols are not sufficiently scalable for the exascale era. The most-widely used method, coordinated checkpointing, places enormous demands on the I/O subsystem and imposes frequent synchronizations. Uncoordinated protocols use message logging which introduces message rate limitations or undesired memory and storage requirements to hold payload and event logs. In this paper we propose a combination of several techniques, namely coordinated checkpointing, optimistic message logging, and a protocol that glues them together. This combination eliminates some of the drawbacks of each individual approach and proves to be an alternative for many types of exascale applications. We evaluate performance and scaling characteristics of this combination using simulation and a partial implementation. While not a universal solution, the combined protocol is suitable for a large range of existing and future applications that use coordinated checkpointing and enhances their scalability.

Chair/Author Details:

Frank Mueller (Chair) - North Carolina State University

Rolf Riesen - IBM

Kurt Ferreira - Sandia National Laboratories

Dilma Da Silva - IBM

Pierre Lemarinier - IBM

Dorian Arnold - University of New Mexico

Patrick G. Bridges - University of New Mexico

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar