BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121112T153000Z DTEND:20121113T000000Z LOCATION:255-E DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Datasets are growing larger and larger each year. The goals of this tutorial are to give an introduction to some of the tools and techniques that can be used for managing and analyzing large datasets.=0A=0A1) We will give an introduction to managing datasets using databases, federated databases (Graywulf architectures), NoSQL databases, and distributed file systems, such as Hadoop.=0A=0A2) We will give an introduction to parallel programming frameworks, such as MapReduce, Hadoop streams, pleasantly parallel computation using collections of virtual machines, and related techniques.=0A=0A3) We will show different ways to explore and analyze large datasets managed by Hadoop using open source data analysis tools, such as R.=0A=0AWe will illustrate these technologies and techniques using several case studies, including: the management and analysis of the large datasets produced by next generation sequencing devices, the analysis of astronomy data produced by the Sloan Digital Sky survey, the analysis of earth science data produced by NASA satellites, and the analysis of netflow data. SUMMARY:A Tutorial Introduction to Big Data PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121112T153000Z DTEND:20121113T000000Z LOCATION:255-E DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Datasets are growing larger and larger each year. The goals of this tutorial are to give an introduction to some of the tools and techniques that can be used for managing and analyzing large datasets.=0A=0A1) We will give an introduction to managing datasets using databases, federated databases (Graywulf architectures), NoSQL databases, and distributed file systems, such as Hadoop.=0A=0A2) We will give an introduction to parallel programming frameworks, such as MapReduce, Hadoop streams, pleasantly parallel computation using collections of virtual machines, and related techniques.=0A=0A3) We will show different ways to explore and analyze large datasets managed by Hadoop using open source data analysis tools, such as R.=0A=0AWe will illustrate these technologies and techniques using several case studies, including: the management and analysis of the large datasets produced by next generation sequencing devices, the analysis of astronomy data produced by the Sloan Digital Sky survey, the analysis of earth science data produced by NASA satellites, and the analysis of netflow data. SUMMARY:A Tutorial Introduction to Big Data PRIORITY:3 END:VEVENT END:VCALENDAR