SCHEDULE: NOV 10-16, 2012
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer
SESSION: Fast Algorithms
EVENT TYPE: Papers
TIME: 11:30AM - 12:00PM
SESSION CHAIR: Torsten Hoefler
AUTHOR(S):Akira Nukada, Kento Sato, Satoshi Matsuoka
ROOM:255-BC
ABSTRACT:
For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the overheads, including employment of lower-level API of InfiniBand to effectively overlap intra- and inter-node communication, as well as auto-tuning strategies to control scheduling and determine rail assignments. As a result we achieve very good strong scalability as well as good performance, up to 4.8TFLOPS using 256 nodes of TSUBAME 2.0 Supercomputer (768 GPUs) in double precision.
Chair/Author Details:
Torsten Hoefler (Chair) - ETH Zurich
Akira Nukada - Tokyo Institute of Technology
Kento Sato - Tokyo Institute of Technology
Satoshi Matsuoka - Tokyo Institute of Technology
Click here to download .ics calendar file
Click here to download .vcs calendar file
Click here to add event to your Google Calendar
Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer
SESSION: Fast Algorithms
EVENT TYPE:
TIME: 11:30AM - 12:00PM
SESSION CHAIR: Torsten Hoefler
AUTHOR(S):Akira Nukada, Kento Sato, Satoshi Matsuoka
ROOM:255-BC
ABSTRACT:
For scalable 3-D FFT computation using multiple GPUs, efficient all-to-all communication between GPUs is the most important factor in good performance. Implementations with point-to-point MPI library functions and CUDA memory copy APIs typically exhibit very large overheads especially for small message sizes in all-to-all communications between many nodes. We propose several schemes to minimize the overheads, including employment of lower-level API of InfiniBand to effectively overlap intra- and inter-node communication, as well as auto-tuning strategies to control scheduling and determine rail assignments. As a result we achieve very good strong scalability as well as good performance, up to 4.8TFLOPS using 256 nodes of TSUBAME 2.0 Supercomputer (768 GPUs) in double precision.
Chair/Author Details:
Torsten Hoefler (Chair) - ETH Zurich
Akira Nukada - Tokyo Institute of Technology
Kento Sato - Tokyo Institute of Technology
Satoshi Matsuoka - Tokyo Institute of Technology
Click here to download .ics calendar file