SC12 Home > SC12 Schedule > SC12 Presentation - Design and Modeling of a Non-Blocking Checkpointing System

SCHEDULE: NOV 10-16, 2012

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Design and Modeling of a Non-Blocking Checkpointing System

SESSION: Checkpointing

EVENT TYPE: Papers

TIME: 2:30PM - 3:00PM

SESSION CHAIR: Frank Mueller

AUTHOR(S):Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka

ROOM:255-EF

ABSTRACT:
As the capability and component count of PFS systems increase, the MTBF correspondingly decreases. Typically, applications tolerate failures with checkpoint/restart using a PFS. While simple, this approach suffers from high overhead due to contention for PFS resources. A promising solution to this problem is multi-level checkpointing. However, while multi-level checkpointing is successful on todays machines, it is not expected to be sufficient for exascale class machines, where the total memory sizes and failure rates are predicted to be orders of magnitude higher. Our solution to this problem is a system that combines the benefits of non-blocking and multi-level checkpointing. In this paper, we present the design of our system and a model describing its performance. Our experiments show that our system can improve efficiency by 1.1 to 2.0 × on future machines. Additionally, applications using our checkpointing system can achieve high efficiency even when using a PFS with lower bandwidth.

Chair/Author Details:

Frank Mueller (Chair) - North Carolina State University

Kento Sato - Tokyo Institute of Technology

Adam Moody - Lawrence Livermore National Laboratory

Kathryn Mohror - Lawrence Livermore National Laboratory

Todd Gamblin - Lawrence Livermore National Laboratory

Bronis R. de Supinski - Lawrence Livermore National Laboratory

Naoya Maruyama - RIKEN

Satoshi Matsuoka - Tokyo Institute of Technology

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

Design and Modeling of a Non-Blocking Checkpointing System

SESSION: Checkpointing

EVENT TYPE:

TIME: 2:30PM - 3:00PM

SESSION CHAIR: Frank Mueller

AUTHOR(S):Kento Sato, Adam Moody, Kathryn Mohror, Todd Gamblin, Bronis R. de Supinski, Naoya Maruyama, Satoshi Matsuoka

ROOM:255-EF

ABSTRACT:
As the capability and component count of PFS systems increase, the MTBF correspondingly decreases. Typically, applications tolerate failures with checkpoint/restart using a PFS. While simple, this approach suffers from high overhead due to contention for PFS resources. A promising solution to this problem is multi-level checkpointing. However, while multi-level checkpointing is successful on todays machines, it is not expected to be sufficient for exascale class machines, where the total memory sizes and failure rates are predicted to be orders of magnitude higher. Our solution to this problem is a system that combines the benefits of non-blocking and multi-level checkpointing. In this paper, we present the design of our system and a model describing its performance. Our experiments show that our system can improve efficiency by 1.1 to 2.0 × on future machines. Additionally, applications using our checkpointing system can achieve high efficiency even when using a PFS with lower bandwidth.

Chair/Author Details:

Frank Mueller (Chair) - North Carolina State University

Kento Sato - Tokyo Institute of Technology

Adam Moody - Lawrence Livermore National Laboratory

Kathryn Mohror - Lawrence Livermore National Laboratory

Todd Gamblin - Lawrence Livermore National Laboratory

Bronis R. de Supinski - Lawrence Livermore National Laboratory

Naoya Maruyama - RIKEN

Satoshi Matsuoka - Tokyo Institute of Technology

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar