BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121113T233000Z DTEND:20121114T000000Z LOCATION:255-BC DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Performance analysis of parallel scientific codes is becoming increasingly difficult due to the rapidly growing complexity of applications and architectures. Existing tools fall short in providing intuitive views that facilitate the process of performance debugging and tuning. In this paper, we extend recent ideas of projecting and visualizing performance data for faster, more intuitive analysis of applications. We collect detailed per-level and per-phase measurements in a dynamically load-balanced, structured AMR library and relate the information back to the application's communication structure. We show how our projections and visualizations lead to a simple diagnosis of and mitigation strategy for a previously elusive scaling bottleneck in the library that is hard to detect using conventional tools. Our new insights have resulted in a 22% performance improvement for a 65,536-core run on an IBM Blue Gene/P system. SUMMARY:Novel Views of Performance Data to Analyze Large-Scale Adaptive Applications PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121113T233000Z DTEND:20121114T000000Z LOCATION:255-BC DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Performance analysis of parallel scientific codes is becoming increasingly difficult due to the rapidly growing complexity of applications and architectures. Existing tools fall short in providing intuitive views that facilitate the process of performance debugging and tuning. In this paper, we extend recent ideas of projecting and visualizing performance data for faster, more intuitive analysis of applications. We collect detailed per-level and per-phase measurements in a dynamically load-balanced, structured AMR library and relate the information back to the application's communication structure. We show how our projections and visualizations lead to a simple diagnosis of and mitigation strategy for a previously elusive scaling bottleneck in the library that is hard to detect using conventional tools. Our new insights have resulted in a 22% performance improvement for a 65,536-core run on an IBM Blue Gene/P system. SUMMARY:Novel Views of Performance Data to Analyze Large-Scale Adaptive Applications PRIORITY:3 END:VEVENT END:VCALENDAR