SCHEDULE: NOV 10-16, 2012
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
Fault Prediction Under the Microscope - A Closer Look Into HPC Systems
SESSION: Fault Detection and Analysis
EVENT TYPE: Papers
TIME: 11:00AM - 11:30AM
SESSION CHAIR: Pedro C. Diniz
AUTHOR(S):Ana Gainaru, Franck Cappello, William Kramer, Marc Snir
A large percentage of computing capacity in today's large high-performance computing systems is wasted due to failures. As a consequence current research is focusing on providing fault tolerance strategies that aim to minimize fault's effects on applications. A complement to this approach is failure avoidance, where the occurrence of a fault is predicted and preventive measures are taken. For this, monitoring systems require a reliable prediction system to give information on what will be generated and at what location. In this paper, we merge signal analysis concepts with data mining techniques to extend the ELSA toolkit to offer an adaptive and overall more efficient prediction module. To this end, a large part of the paper is focused on a detailed analysis of the prediction method, by applying it to two large-scale systems. Furthermore, we analyze the prediction's impact on current checkpointing strategies and highlight future improvements and directions.
Pedro C. Diniz (Chair) - University of Southern California
Ana Gainaru - University of Illinois at Urbana-Champaign
Franck Cappello - INRIA
William Kramer - National Center for Supercomputing Applications
Marc Snir - University of Illinois at Urbana-Champaign