BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121115T203000Z DTEND:20121115T210000Z LOCATION:255-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: We seek to enable efficient large-scale parallel execution of applications in which =0Aa shared filesystem abstraction is used to couple many tasks. Such parallel scripting (Many-Task-Computing) applications suffer poor =0Aperformance and utilization on large parallel computers due to the volume of filesystem I/O and a lack of appropriate=0Aoptimizations in the shared filesystem. Thus, we design and implement a scalable MTC data =0Amanagement system that uses aggregated compute node local storage for more=0Aefficient data movement strategies. We =0Aco-design the data management system with the data-aware scheduler to enable =0Adataflow pattern identification and automatic optimization. The framework reduces the =0Atime-to-solution of parallel stages of an astronomy data analysis application, Montage, by 83.2% on 512 =0Acores, decreases time-to-solution of a seismology application, CyberShake, by 7.9% =0Aon 2,048 cores, and delivers BLAST performance better than mpiBLAST at various =0Ascales up to 32,768 cores, while preserving the flexibility of the original BLAST =0Aapplication. SUMMARY:Design and Analysis of Data Management in Scalable Parallel Scripting PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121115T203000Z DTEND:20121115T210000Z LOCATION:255-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: We seek to enable efficient large-scale parallel execution of applications in which =0Aa shared filesystem abstraction is used to couple many tasks. Such parallel scripting (Many-Task-Computing) applications suffer poor =0Aperformance and utilization on large parallel computers due to the volume of filesystem I/O and a lack of appropriate=0Aoptimizations in the shared filesystem. Thus, we design and implement a scalable MTC data =0Amanagement system that uses aggregated compute node local storage for more=0Aefficient data movement strategies. We =0Aco-design the data management system with the data-aware scheduler to enable =0Adataflow pattern identification and automatic optimization. The framework reduces the =0Atime-to-solution of parallel stages of an astronomy data analysis application, Montage, by 83.2% on 512 =0Acores, decreases time-to-solution of a seismology application, CyberShake, by 7.9% =0Aon 2,048 cores, and delivers BLAST performance better than mpiBLAST at various =0Ascales up to 32,768 cores, while preserving the flexibility of the original BLAST =0Aapplication. SUMMARY:Design and Analysis of Data Management in Scalable Parallel Scripting PRIORITY:3 END:VEVENT END:VCALENDAR