BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T001500Z DTEND:20121114T020000Z LOCATION:East Entrance DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: A plasma turbulence research based on 5D gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in time scales, an improvement of strong scaling is essential. Overlap of computations and communications is a promising approach, but it often fails on practical applications with conventional MPI libraries. In this work, this classical issue is clarified, and resolved by developing communication overlap techniques with mpi_test and communication threads, which work even on conventional MPI libraries and hardwares. These techniques dramatically improve the parallel efficiency of a gyrokinetic Eularian code GT5D on K and Helios, which adopt dedicated and commodity networks. On K, excellent strong scaling is confirmed beyond 10^5 cores with keeping the peak ratio of 10% (307 TFlops at 196,608 cores), and simulations for ITER-size fusion devices are significantly accelerated. SUMMARY:Communication Overlap Techniques for Improved Strong Scaling of Gyrokinetic Eulerian Code Beyond 100k Cores on K-Computer PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T001500Z DTEND:20121114T020000Z LOCATION:East Entrance DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: A plasma turbulence research based on 5D gyrokinetic simulations is one of the most critical and demanding issues in fusion science. To pioneer new physics regimes both in problem sizes and in time scales, an improvement of strong scaling is essential. Overlap of computations and communications is a promising approach, but it often fails on practical applications with conventional MPI libraries. In this work, this classical issue is clarified, and resolved by developing communication overlap techniques with mpi_test and communication threads, which work even on conventional MPI libraries and hardwares. These techniques dramatically improve the parallel efficiency of a gyrokinetic Eularian code GT5D on K and Helios, which adopt dedicated and commodity networks. On K, excellent strong scaling is confirmed beyond 10^5 cores with keeping the peak ratio of 10% (307 TFlops at 196,608 cores), and simulations for ITER-size fusion devices are significantly accelerated. SUMMARY:Communication Overlap Techniques for Improved Strong Scaling of Gyrokinetic Eulerian Code Beyond 100k Cores on K-Computer PRIORITY:3 END:VEVENT END:VCALENDAR