BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121113T180000Z DTEND:20121113T183000Z LOCATION:355-D DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: In this paper we introduce a multi-objective auto-tuning framework comprising=0Acompiler and runtime components. Focusing on individual code regions, our=0Acompiler uses a novel search technique to compute a set of optimal solutions,=0Awhich are encoded into a multi-versioned executable. This enables the runtime=0Asystem to choose specifically tuned code versions when dynamically adjusting to=0Achanging circumstances.=0A=0AWe demonstrate our method by tuning loop tiling in cache-sensitive parallel programs, optimizing for both runtime and efficiency. Our static optimizer finds solutions matching or surpassing those determined by exhaustively sampling the search space on a regular grid, while using less than 4% of the computational=0Aeffort on average. Additionally, we show that parallelism-aware multi-versioning approaches like our own gain a performance improvement of up to 70% over solutions tuned for only one specific number of threads. SUMMARY:A Multi-Objective Auto-Tuning Framework for Parallel Codes PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121113T180000Z DTEND:20121113T183000Z LOCATION:355-D DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: In this paper we introduce a multi-objective auto-tuning framework comprising=0Acompiler and runtime components. Focusing on individual code regions, our=0Acompiler uses a novel search technique to compute a set of optimal solutions,=0Awhich are encoded into a multi-versioned executable. This enables the runtime=0Asystem to choose specifically tuned code versions when dynamically adjusting to=0Achanging circumstances.=0A=0AWe demonstrate our method by tuning loop tiling in cache-sensitive parallel programs, optimizing for both runtime and efficiency. Our static optimizer finds solutions matching or surpassing those determined by exhaustively sampling the search space on a regular grid, while using less than 4% of the computational=0Aeffort on average. Additionally, we show that parallelism-aware multi-versioning approaches like our own gain a performance improvement of up to 70% over solutions tuned for only one specific number of threads. SUMMARY:A Multi-Objective Auto-Tuning Framework for Parallel Codes PRIORITY:3 END:VEVENT END:VCALENDAR