BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T001500Z DTEND:20121114T020000Z LOCATION:East Entrance DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Clock rates remain flat while transistor density increases, so microprocessor designers are providing more parallelism on a chip by increasing vector length and core count. For example, the Intel Westmere architecture has a vector length of four floats (128 bits) and six cores compared to eight floats (256 bits) and eight cores on the Intel Sandy Bridge. Applications must get good vector and shared-memory performance in order to leverage these hardware advances. Dissipative Particle Dynamics (DPD) is analogous to traditional molecular dynamics techniques applied to mesoscale simulations. We analyzed and restructured an existing DPD implementation to improve vector and OpenMP performance for the Intel Xeon and MIC architectures. We designed an efficient partitioned global address space (PGAS) implementation using the Global Arrays Toolkit using this experience. We present performance results on representative architectures. SUMMARY:Application Restructuring for Vectorization and Parallelization: A Case Study PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T001500Z DTEND:20121114T020000Z LOCATION:East Entrance DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Clock rates remain flat while transistor density increases, so microprocessor designers are providing more parallelism on a chip by increasing vector length and core count. For example, the Intel Westmere architecture has a vector length of four floats (128 bits) and six cores compared to eight floats (256 bits) and eight cores on the Intel Sandy Bridge. Applications must get good vector and shared-memory performance in order to leverage these hardware advances. Dissipative Particle Dynamics (DPD) is analogous to traditional molecular dynamics techniques applied to mesoscale simulations. We analyzed and restructured an existing DPD implementation to improve vector and OpenMP performance for the Intel Xeon and MIC architectures. We designed an efficient partitioned global address space (PGAS) implementation using the Global Arrays Toolkit using this experience. We present performance results on representative architectures. SUMMARY:Application Restructuring for Vectorization and Parallelization: A Case Study PRIORITY:3 END:VEVENT END:VCALENDAR