BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T001500Z DTEND:20121114T020000Z LOCATION:East Entrance DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: The Graph500 benchmark is designed to evaluate the suitability of supercomputing systems on graph algorithms, which are increasingly important in HPC. The timed Graph500 kernel, Breadth First Search, exhibits memory access patterns typical of these types of applications, with poor spatial locality and synchronization between multiple streams of execution.=0A=0AThe Graph500 benchmark was ported to a Convey HC-2ex, a hybrid-core computer with an Intel host system and a coprocessor incorporating four reprogrammable Xilinx FPGAs. The computer incorporates a unique memory system designed to sustain high bandwidth for random memory accesses. The BFS kernel was implemented as a hybrid algorithm with concurrent processing on both the host and coprocessor. The early steps use a top-down algorithm on the host with results copied to coprocessor memory for use in a bottom-up algorithm. The coprocessor uses thousands of threads to traverse the graph. The resulting implementation runs at over 11 billion TEPS. SUMMARY:Hybrid Breadth First Search Implementation for Hybrid-Core Computers PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T001500Z DTEND:20121114T020000Z LOCATION:East Entrance DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: The Graph500 benchmark is designed to evaluate the suitability of supercomputing systems on graph algorithms, which are increasingly important in HPC. The timed Graph500 kernel, Breadth First Search, exhibits memory access patterns typical of these types of applications, with poor spatial locality and synchronization between multiple streams of execution.=0A=0AThe Graph500 benchmark was ported to a Convey HC-2ex, a hybrid-core computer with an Intel host system and a coprocessor incorporating four reprogrammable Xilinx FPGAs. The computer incorporates a unique memory system designed to sustain high bandwidth for random memory accesses. The BFS kernel was implemented as a hybrid algorithm with concurrent processing on both the host and coprocessor. The early steps use a top-down algorithm on the host with results copied to coprocessor memory for use in a bottom-up algorithm. The coprocessor uses thousands of threads to traverse the graph. The resulting implementation runs at over 11 billion TEPS. SUMMARY:Hybrid Breadth First Search Implementation for Hybrid-Core Computers PRIORITY:3 END:VEVENT END:VCALENDAR