SC12 Home > SC12 Schedule > SC12 Presentation - A Study on Data Deduplication in HPC Storage Systems

SCHEDULE: NOV 10-16, 2012

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

A Study on Data Deduplication in HPC Storage Systems

SESSION: Analysis of I/O and Storage

EVENT TYPE: Papers

TIME: 11:00AM - 11:30AM

SESSION CHAIR: Robert B. Ross

AUTHOR(S):Dirk Meister, Jürgen Kaiser, Andre Brinkmann, Toni Cortes, Michael Kuhn, Julian Kunkel

ROOM:355-EF

ABSTRACT:
Deduplication is a storage saving technique that is successful in backup environments. On a file system a single data block might be stored multiple times across different files, for example, multiple versions of a file might exist that are mostly identical. With deduplication this data replication is localized and redundancy is removed. This paper presents the first study on the potential of data deduplication in HPC centers, which belong to the most demanding storage producers. We have quantitatively assessed this potential for capacity reduction for 4 data centers. We have analyzed over 1212 TB of file system data. The evaluation shows that typically 20% to 30% of this online data could be removed by applying data deduplication techniques, peaking up to 70% for some data sets. Interestingly, this reduction can only be achieved by a subfile deduplication approach, while approaches based on whole-file comparisons only lead to small capacity savings.

Chair/Author Details:

Robert B. Ross (Chair) - Argonne National Laboratory

Dirk Meister - Johannes Gutenberg University Mainz

Jürgen Kaiser - Johannes Gutenberg University Mainz

Andre Brinkmann - Johannes Gutenberg University Mainz

Toni Cortes - Barcelona Supercomputing Center

Michael Kuhn - University of Hamburg

Julian Kunkel - University of Hamburg

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

A Study on Data Deduplication in HPC Storage Systems

SESSION: Analysis of I/O and Storage

EVENT TYPE:

TIME: 11:00AM - 11:30AM

SESSION CHAIR: Robert B. Ross

AUTHOR(S):Dirk Meister, Jürgen Kaiser, Andre Brinkmann, Toni Cortes, Michael Kuhn, Julian Kunkel

ROOM:355-EF

ABSTRACT:
Deduplication is a storage saving technique that is successful in backup environments. On a file system a single data block might be stored multiple times across different files, for example, multiple versions of a file might exist that are mostly identical. With deduplication this data replication is localized and redundancy is removed. This paper presents the first study on the potential of data deduplication in HPC centers, which belong to the most demanding storage producers. We have quantitatively assessed this potential for capacity reduction for 4 data centers. We have analyzed over 1212 TB of file system data. The evaluation shows that typically 20% to 30% of this online data could be removed by applying data deduplication techniques, peaking up to 70% for some data sets. Interestingly, this reduction can only be achieved by a subfile deduplication approach, while approaches based on whole-file comparisons only lead to small capacity savings.

Chair/Author Details:

Robert B. Ross (Chair) - Argonne National Laboratory

Dirk Meister - Johannes Gutenberg University Mainz

Jürgen Kaiser - Johannes Gutenberg University Mainz

Andre Brinkmann - Johannes Gutenberg University Mainz

Toni Cortes - Barcelona Supercomputing Center

Michael Kuhn - University of Hamburg

Julian Kunkel - University of Hamburg

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar