SCHEDULE: NOV 10-16, 2012
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
A Study on Data Deduplication in HPC Storage Systems
SESSION: Analysis of I/O and Storage
EVENT TYPE: Papers
TIME: 11:00AM - 11:30AM
SESSION CHAIR: Robert B. Ross
AUTHOR(S):Dirk Meister, Jürgen Kaiser, Andre Brinkmann, Toni Cortes, Michael Kuhn, Julian Kunkel
ROOM:355-EF
ABSTRACT:
Deduplication is a storage saving technique that is successful in backup environments. On a file system a single data block might be stored multiple times across different files, for example, multiple versions of a file might exist that are mostly identical. With deduplication this data replication is localized and redundancy is removed.
This paper presents the first study on the potential of data deduplication in HPC centers, which belong to the most demanding storage producers. We have quantitatively assessed this potential for capacity reduction for 4 data centers. We have analyzed over 1212 TB of file system data. The evaluation shows that typically 20% to 30% of this online data could be removed by applying data deduplication techniques, peaking up to 70% for some data sets. Interestingly, this reduction can only be achieved by a subfile deduplication approach, while approaches based on whole-file comparisons only lead to small capacity savings.
Chair/Author Details:
Robert B. Ross (Chair) - Argonne National Laboratory
Dirk Meister - Johannes Gutenberg University Mainz
Jürgen Kaiser - Johannes Gutenberg University Mainz
Andre Brinkmann - Johannes Gutenberg University Mainz
Toni Cortes - Barcelona Supercomputing Center
Michael Kuhn - University of Hamburg
Julian Kunkel - University of Hamburg
Click here to download .ics calendar file
Click here to download .vcs calendar file
Click here to add event to your Google Calendar
A Study on Data Deduplication in HPC Storage Systems
SESSION: Analysis of I/O and Storage
EVENT TYPE:
TIME: 11:00AM - 11:30AM
SESSION CHAIR: Robert B. Ross
AUTHOR(S):Dirk Meister, Jürgen Kaiser, Andre Brinkmann, Toni Cortes, Michael Kuhn, Julian Kunkel
ROOM:355-EF
ABSTRACT:
Deduplication is a storage saving technique that is successful in backup environments. On a file system a single data block might be stored multiple times across different files, for example, multiple versions of a file might exist that are mostly identical. With deduplication this data replication is localized and redundancy is removed.
This paper presents the first study on the potential of data deduplication in HPC centers, which belong to the most demanding storage producers. We have quantitatively assessed this potential for capacity reduction for 4 data centers. We have analyzed over 1212 TB of file system data. The evaluation shows that typically 20% to 30% of this online data could be removed by applying data deduplication techniques, peaking up to 70% for some data sets. Interestingly, this reduction can only be achieved by a subfile deduplication approach, while approaches based on whole-file comparisons only lead to small capacity savings.
Chair/Author Details:
Robert B. Ross (Chair) - Argonne National Laboratory
Dirk Meister - Johannes Gutenberg University Mainz
Jürgen Kaiser - Johannes Gutenberg University Mainz
Andre Brinkmann - Johannes Gutenberg University Mainz
Toni Cortes - Barcelona Supercomputing Center
Michael Kuhn - University of Hamburg
Julian Kunkel - University of Hamburg
Click here to download .ics calendar file