SC12 Home > SC12 Schedule > SC12 Presentation - Design and Analysis of Data Management in Scalable Parallel Scripting

SCHEDULE: NOV 10-16, 2012

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Design and Analysis of Data Management in Scalable Parallel Scripting

SESSION: Big Data

EVENT TYPE: Papers

TIME: 1:30PM - 2:00PM

SESSION CHAIR: Dennis Gannon

AUTHOR(S):Zhao Zhang, Daniel S. Katz, Justin M. Wozniak, Allan Espinosa, Ian Foster

ROOM:255-EF

ABSTRACT:
We seek to enable efficient large-scale parallel execution of applications in which a shared filesystem abstraction is used to couple many tasks. Such parallel scripting (Many-Task-Computing) applications suffer poor performance and utilization on large parallel computers due to the volume of filesystem I/O and a lack of appropriate optimizations in the shared filesystem. Thus, we design and implement a scalable MTC data management system that uses aggregated compute node local storage for more efficient data movement strategies. We co-design the data management system with the data-aware scheduler to enable dataflow pattern identification and automatic optimization. The framework reduces the time-to-solution of parallel stages of an astronomy data analysis application, Montage, by 83.2% on 512 cores, decreases time-to-solution of a seismology application, CyberShake, by 7.9% on 2,048 cores, and delivers BLAST performance better than mpiBLAST at various scales up to 32,768 cores, while preserving the flexibility of the original BLAST application.

Chair/Author Details:

Dennis Gannon (Chair) - Microsoft Corporation

Zhao Zhang - University of Chicago

Daniel S. Katz - University of Chicago

Justin M. Wozniak - Argonne National Laboratory

Allan Espinosa - University of Chicago

Ian Foster - University of Chicago

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

Design and Analysis of Data Management in Scalable Parallel Scripting

SESSION: Big Data

EVENT TYPE:

TIME: 1:30PM - 2:00PM

SESSION CHAIR: Dennis Gannon

AUTHOR(S):Zhao Zhang, Daniel S. Katz, Justin M. Wozniak, Allan Espinosa, Ian Foster

ROOM:255-EF

ABSTRACT:
We seek to enable efficient large-scale parallel execution of applications in which a shared filesystem abstraction is used to couple many tasks. Such parallel scripting (Many-Task-Computing) applications suffer poor performance and utilization on large parallel computers due to the volume of filesystem I/O and a lack of appropriate optimizations in the shared filesystem. Thus, we design and implement a scalable MTC data management system that uses aggregated compute node local storage for more efficient data movement strategies. We co-design the data management system with the data-aware scheduler to enable dataflow pattern identification and automatic optimization. The framework reduces the time-to-solution of parallel stages of an astronomy data analysis application, Montage, by 83.2% on 512 cores, decreases time-to-solution of a seismology application, CyberShake, by 7.9% on 2,048 cores, and delivers BLAST performance better than mpiBLAST at various scales up to 32,768 cores, while preserving the flexibility of the original BLAST application.

Chair/Author Details:

Dennis Gannon (Chair) - Microsoft Corporation

Zhao Zhang - University of Chicago

Daniel S. Katz - University of Chicago

Justin M. Wozniak - Argonne National Laboratory

Allan Espinosa - University of Chicago

Ian Foster - University of Chicago

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar