SC12 Home > SC12 Schedule > SC12 Presentation - Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

SCHEDULE: NOV 10-16, 2012

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

SESSION: Fast Algorithms

EVENT TYPE: Papers

TIME: 11:30AM - 12:00PM

SESSION CHAIR: Torsten Hoefler

AUTHOR(S):Akira Nukada, Kento Sato, Satoshi Matsuoka

ROOM:255-BC

ABSTRACT:
For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the overheads, including employment of lower-level API of InfiniBand to effectively overlap intra- and inter-node communication, as well as auto-tuning strategies to control scheduling and determine rail assignments. As a result we achieve very good strong scalability as well as good performance, up to 4.8TFLOPS using 256 nodes of TSUBAME 2.0 Supercomputer (768 GPUs) in double precision.

Chair/Author Details:

Torsten Hoefler (Chair) - ETH Zurich

Akira Nukada - Tokyo Institute of Technology

Kento Sato - Tokyo Institute of Technology

Satoshi Matsuoka - Tokyo Institute of Technology

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer

SESSION: Fast Algorithms

EVENT TYPE:

TIME: 11:30AM - 12:00PM

SESSION CHAIR: Torsten Hoefler

AUTHOR(S):Akira Nukada, Kento Sato, Satoshi Matsuoka

ROOM:255-BC

ABSTRACT:
For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the overheads, including employment of lower-level API of InfiniBand to effectively overlap intra- and inter-node communication, as well as auto-tuning strategies to control scheduling and determine rail assignments. As a result we achieve very good strong scalability as well as good performance, up to 4.8TFLOPS using 256 nodes of TSUBAME 2.0 Supercomputer (768 GPUs) in double precision.

Chair/Author Details:

Torsten Hoefler (Chair) - ETH Zurich

Akira Nukada - Tokyo Institute of Technology

Kento Sato - Tokyo Institute of Technology

Satoshi Matsuoka - Tokyo Institute of Technology

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar