SCHEDULE: NOV 10-16, 2012
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure
SESSION: Graph Algorithms
EVENT TYPE: Papers
TIME: 3:30PM - 4:00PM
SESSION CHAIR: Esmond G. Ng
AUTHOR(S):Md. Mostofa Ali Patwary, Diana Palsetia, Ankit Agrawal, Wei-keng Liao, Fredrik Manne, Alok Choudhary
ROOM:355-EF
ABSTRACT:
DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload resulting low parallel efficiency.
We present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory.
Using data sets containing several hundred million high-dimensional points, we show that PDSDBCAN significantly outperforms the master-slave approach, achieving speedups up to 30.3 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.
Chair/Author Details:
Esmond G. Ng (Chair) - Lawrence Berkeley National Laboratory
Md. Mostofa Ali Patwary - Northwestern University
Diana Palsetia - Northwestern University
Ankit Agrawal - Northwestern University
Wei-keng Liao - Northwestern University
Fredrik Manne - University of Bergen
Alok Choudhary - Northwestern University
Click here to download .ics calendar file
Click here to download .vcs calendar file
Click here to add event to your Google Calendar
A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure
SESSION: Graph Algorithms
EVENT TYPE:
TIME: 3:30PM - 4:00PM
SESSION CHAIR: Esmond G. Ng
AUTHOR(S):Md. Mostofa Ali Patwary, Diana Palsetia, Ankit Agrawal, Wei-keng Liao, Fredrik Manne, Alok Choudhary
ROOM:355-EF
ABSTRACT:
DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload resulting low parallel efficiency.
We present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory.
Using data sets containing several hundred million high-dimensional points, we show that PDSDBCAN significantly outperforms the master-slave approach, achieving speedups up to 30.3 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.
Chair/Author Details:
Esmond G. Ng (Chair) - Lawrence Berkeley National Laboratory
Md. Mostofa Ali Patwary - Northwestern University
Diana Palsetia - Northwestern University
Ankit Agrawal - Northwestern University
Wei-keng Liao - Northwestern University
Fredrik Manne - University of Bergen
Alok Choudhary - Northwestern University
Click here to download .ics calendar file