SC12 Home > SC12 Schedule > SC12 Presentation - On Distributed File Tree Walk of Parallel File Systems

SCHEDULE: NOV 10-16, 2012

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

On Distributed File Tree Walk of Parallel File Systems

SESSION: Big Data

EVENT TYPE: Papers

TIME: 2:30PM - 3:00PM

SESSION CHAIR: Dennis Gannon

AUTHOR(S):Jharrod LaFon, Satyajayant Misra, Jon Bringhurst

ROOM:255-EF

ABSTRACT:
Supercomputers generate vast amounts of data, typically organized into large directory hierarchies on parallel file systems. While the supercomputing applications are parallel, the tools used to process them requiring complete directory traversals, are typically serial. We present an algorithm framework and three fully distributed algorithms for traversing large parallel file systems, and performing file operations in parallel. The first algorithm introduces a randomized work-stealing scheduler; the second improves the first with topology-awareness; and the third improves upon the second by using a hybrid approach. We have tested our implementation on Cielo, a 1.37 petaflop supercomputer at the Los Alamos National Laboratory and its 7 petabyte file system. Test results show that our algorithms execute orders of magnitude faster than state-of-the-art algorithms while achieving ideal load balancing and low communication cost. We present performance insights from the use of our algorithms in production systems at LANL, performing daily file system operations.

Chair/Author Details:

Dennis Gannon (Chair) - Microsoft Corporation

Jharrod LaFon - Los Alamos National Laboratory

Satyajayant Misra - New Mexico State University

Jon Bringhurst - Los Alamos National Laboratory

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

On Distributed File Tree Walk of Parallel File Systems

SESSION: Big Data

EVENT TYPE:

TIME: 2:30PM - 3:00PM

SESSION CHAIR: Dennis Gannon

AUTHOR(S):Jharrod LaFon, Satyajayant Misra, Jon Bringhurst

ROOM:255-EF

ABSTRACT:
Supercomputers generate vast amounts of data, typically organized into large directory hierarchies on parallel file systems. While the supercomputing applications are parallel, the tools used to process them requiring complete directory traversals, are typically serial. We present an algorithm framework and three fully distributed algorithms for traversing large parallel file systems, and performing file operations in parallel. The first algorithm introduces a randomized work-stealing scheduler; the second improves the first with topology-awareness; and the third improves upon the second by using a hybrid approach. We have tested our implementation on Cielo, a 1.37 petaflop supercomputer at the Los Alamos National Laboratory and its 7 petabyte file system. Test results show that our algorithms execute orders of magnitude faster than state-of-the-art algorithms while achieving ideal load balancing and low communication cost. We present performance insights from the use of our algorithms in production systems at LANL, performing daily file system operations.

Chair/Author Details:

Dennis Gannon (Chair) - Microsoft Corporation

Jharrod LaFon - Los Alamos National Laboratory

Satyajayant Misra - New Mexico State University

Jon Bringhurst - Los Alamos National Laboratory

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar