SCHEDULE: NOV 10-16, 2012
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
Design and Analysis of Data Management in Scalable Parallel Scripting
SESSION: Big Data
EVENT TYPE: Papers
TIME: 1:30PM - 2:00PM
SESSION CHAIR: Dennis Gannon
AUTHOR(S):Zhao Zhang, Daniel S. Katz, Justin M. Wozniak, Allan Espinosa, Ian Foster
We seek to enable efficient large-scale parallel execution of applications in which a shared filesystem abstraction is used to couple many tasks. Such parallel scripting (Many-Task-Computing) applications suffer poor performance and utilization on large parallel computers due to the volume of filesystem I/O and a lack of appropriate optimizations in the shared filesystem. Thus, we design and implement a scalable MTC data management system that uses aggregated compute node local storage for more efficient data movement strategies. We co-design the data management system with the data-aware scheduler to enable dataflow pattern identification and automatic optimization. The framework reduces the time-to-solution of parallel stages of an astronomy data analysis application, Montage, by 83.2% on 512 cores, decreases time-to-solution of a seismology application, CyberShake, by 7.9% on 2,048 cores, and delivers BLAST performance better than mpiBLAST at various scales up to 32,768 cores, while preserving the flexibility of the original BLAST application.
Dennis Gannon (Chair) - Microsoft Corporation
Zhao Zhang - University of Chicago
Daniel S. Katz - University of Chicago
Justin M. Wozniak - Argonne National Laboratory
Allan Espinosa - University of Chicago
Ian Foster - University of Chicago