SC12 Home > SC12 Schedule > SC12 Presentation - A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure

SCHEDULE: NOV 10-16, 2012

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure

SESSION: Graph Algorithms

EVENT TYPE: Papers

TIME: 3:30PM - 4:00PM

SESSION CHAIR: Esmond G. Ng

AUTHOR(S):Md. Mostofa Ali Patwary, Diana Palsetia, Ankit Agrawal, Wei-keng Liao, Fredrik Manne, Alok Choudhary

ROOM:355-EF

ABSTRACT:
DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload resulting low parallel efficiency. We present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory. Using data sets containing several hundred million high-dimensional points, we show that PDSDBCAN significantly outperforms the master-slave approach, achieving speedups up to 30.3 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.

Chair/Author Details:

Esmond G. Ng (Chair) - Lawrence Berkeley National Laboratory

Md. Mostofa Ali Patwary - Northwestern University

Diana Palsetia - Northwestern University

Ankit Agrawal - Northwestern University

Wei-keng Liao - Northwestern University

Fredrik Manne - University of Bergen

Alok Choudhary - Northwestern University

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure

SESSION: Graph Algorithms

EVENT TYPE:

TIME: 3:30PM - 4:00PM

SESSION CHAIR: Esmond G. Ng

AUTHOR(S):Md. Mostofa Ali Patwary, Diana Palsetia, Ankit Agrawal, Wei-keng Liao, Fredrik Manne, Alok Choudhary

ROOM:355-EF

ABSTRACT:
DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload resulting low parallel efficiency. We present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory. Using data sets containing several hundred million high-dimensional points, we show that PDSDBCAN significantly outperforms the master-slave approach, achieving speedups up to 30.3 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.

Chair/Author Details:

Esmond G. Ng (Chair) - Lawrence Berkeley National Laboratory

Md. Mostofa Ali Patwary - Northwestern University

Diana Palsetia - Northwestern University

Ankit Agrawal - Northwestern University

Wei-keng Liao - Northwestern University

Fredrik Manne - University of Bergen

Alok Choudhary - Northwestern University

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar