BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121113T183000Z DTEND:20121113T190000Z LOCATION:255-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Graph-traversal is used in many fields including social-networks, bioinformatics and HPC. The push for HPC machines to be rated in ``GigaTEPS" (billions-of-traversed-edges-per-second) has led to the Graph500 benchmark. =0A=0AGraph-traversal is well-optimized for single-node CPUs. However, current cluster implementations suffer from high-latency and large-volume inter-node communication, with low performance and energy-efficiency. In this work, we use novel low-overhead data-compression techniques to reduce communication-volumes along with new latency-hiding techniques. Keeping the same optimized single-node algorithm, we obtain 6.6X performance improvement and order-of-magnitude energy savings over state-of-the-art techniques.=0A=0AOur Graph500 implementation achieves 115 GigaTEPS on 320-node Intel-Endeavor cluster with E5-2700 Sandybridge nodes, matching the second-ranked result in the November-2011 Graph500 list with 5.6X fewer nodes. Our per-node performance only drops 1.8X over optimized single-node implementations, and is highest in the top 10 of the list. We obtain near-linear scaling with node count. On 1024 Westmere-nodes of the NASA-Pleiadas system, we obtain 195 GigaTEPS. SUMMARY:Large-Scale Energy-Efficient Graph Traversal - A Path to Efficient Data-Intensive Supercomputing PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121113T183000Z DTEND:20121113T190000Z LOCATION:255-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Graph-traversal is used in many fields including social-networks, bioinformatics and HPC. The push for HPC machines to be rated in ``GigaTEPS" (billions-of-traversed-edges-per-second) has led to the Graph500 benchmark. =0A=0AGraph-traversal is well-optimized for single-node CPUs. However, current cluster implementations suffer from high-latency and large-volume inter-node communication, with low performance and energy-efficiency. In this work, we use novel low-overhead data-compression techniques to reduce communication-volumes along with new latency-hiding techniques. Keeping the same optimized single-node algorithm, we obtain 6.6X performance improvement and order-of-magnitude energy savings over state-of-the-art techniques.=0A=0AOur Graph500 implementation achieves 115 GigaTEPS on 320-node Intel-Endeavor cluster with E5-2700 Sandybridge nodes, matching the second-ranked result in the November-2011 Graph500 list with 5.6X fewer nodes. Our per-node performance only drops 1.8X over optimized single-node implementations, and is highest in the top 10 of the list. We obtain near-linear scaling with node count. On 1024 Westmere-nodes of the NASA-Pleiadas system, we obtain 195 GigaTEPS. SUMMARY:Large-Scale Energy-Efficient Graph Traversal - A Path to Efficient Data-Intensive Supercomputing PRIORITY:3 END:VEVENT END:VCALENDAR