SCHEDULE: NOV 10-16, 2012
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
A Tutorial Introduction to Big Data
SESSION: A Tutorial Introduction to Big Data
EVENT TYPE: Tutorials
TIME: 8:30AM - 5:00PM
Presenter(s):Robert Grossman, Alex Szalay, Collin Bennett
Datasets are growing larger and larger each year. The goals of this tutorial are to give an introduction to some of the tools and techniques that can be used for managing and analyzing large datasets. 1) We will give an introduction to managing datasets using databases, federated databases (Graywulf architectures), NoSQL databases, and distributed file systems, such as Hadoop. 2) We will give an introduction to parallel programming frameworks, such as MapReduce, Hadoop streams, pleasantly parallel computation using collections of virtual machines, and related techniques. 3) We will show different ways to explore and analyze large datasets managed by Hadoop using open source data analysis tools, such as R. We will illustrate these technologies and techniques using several case studies, including: the management and analysis of the large datasets produced by next generation sequencing devices, the analysis of astronomy data produced by the Sloan Digital Sky survey, the analysis of earth science data produced by NASA satellites, and the analysis of netflow data.
Robert Grossman - University of Chicago
Alex Szalay - Johns Hopkins University
Collin Bennett - Open Data Group