BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121113T180000Z DTEND:20121113T183000Z LOCATION:355-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Deduplication is a storage saving technique that is successful in backup environments. On a file system a single data block might be stored multiple times across different files, for example, multiple versions of a file might exist that are mostly identical. With deduplication this data replication is localized and redundancy is removed.=0AThis paper presents the first study on the potential of data deduplication in HPC centers, which belong to the most demanding storage producers. We have quantitatively assessed this potential for capacity reduction for 4 data centers. We have analyzed over 1212 TB of file system data. The evaluation shows that typically 20% to 30% of this online data could be removed by applying data deduplication techniques, peaking up to 70% for some data sets. Interestingly, this reduction can only be achieved by a subfile deduplication approach, while approaches based on whole-file comparisons only lead to small capacity savings. SUMMARY:A Study on Data Deduplication in HPC Storage Systems PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121113T180000Z DTEND:20121113T183000Z LOCATION:355-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Deduplication is a storage saving technique that is successful in backup environments. On a file system a single data block might be stored multiple times across different files, for example, multiple versions of a file might exist that are mostly identical. With deduplication this data replication is localized and redundancy is removed.=0AThis paper presents the first study on the potential of data deduplication in HPC centers, which belong to the most demanding storage producers. We have quantitatively assessed this potential for capacity reduction for 4 data centers. We have analyzed over 1212 TB of file system data. The evaluation shows that typically 20% to 30% of this online data could be removed by applying data deduplication techniques, peaking up to 70% for some data sets. Interestingly, this reduction can only be achieved by a subfile deduplication approach, while approaches based on whole-file comparisons only lead to small capacity savings. SUMMARY:A Study on Data Deduplication in HPC Storage Systems PRIORITY:3 END:VEVENT END:VCALENDAR