BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T173000Z DTEND:20121114T180000Z LOCATION:355-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Lattice Quantum Chromodynamics (QCD) is one of the most challenging applications running on massively parallel supercomputers. To reproduce these physical phenomena on a supercomputer, a precise simulation is demanded requiring well optimized and scalable code. We have optimized lattice QCD programs on Blue Gene family supercomputers and shown the strength in lattice QCD simulation. Here we optimized on the third generation Blue Gene/Q supercomputer; i) by changing the data layout, ii) by exploiting new SIMD instruction sets, and iii) by pipelining boundary data exchange to overlap communication and calculation. The optimized lattice QCD program shows excellent weak scalability on the large scale Blue Gene/Q system, and with 16 racks we sustained 1.08 Pflops, 32.1% of the theoretical peak performance, including the conjugate gradient solver routines. SUMMARY:Peta-Scale Lattice Quantum Chromodynamics on a Blue Gene/Q Supercomputer PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T173000Z DTEND:20121114T180000Z LOCATION:355-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Lattice Quantum Chromodynamics (QCD) is one of the most challenging applications running on massively parallel supercomputers. To reproduce these physical phenomena on a supercomputer, a precise simulation is demanded requiring well optimized and scalable code. We have optimized lattice QCD programs on Blue Gene family supercomputers and shown the strength in lattice QCD simulation. Here we optimized on the third generation Blue Gene/Q supercomputer; i) by changing the data layout, ii) by exploiting new SIMD instruction sets, and iii) by pipelining boundary data exchange to overlap communication and calculation. The optimized lattice QCD program shows excellent weak scalability on the large scale Blue Gene/Q system, and with 16 racks we sustained 1.08 Pflops, 32.1% of the theoretical peak performance, including the conjugate gradient solver routines. SUMMARY:Peta-Scale Lattice Quantum Chromodynamics on a Blue Gene/Q Supercomputer PRIORITY:3 END:VEVENT END:VCALENDAR