BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T223000Z DTEND:20121114T230000Z LOCATION:355-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload resulting low parallel efficiency.=0A=0AWe present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory.=0A=0AUsing data sets containing several hundred million high-dimensional points, we show that PDSDBCAN significantly outperforms the master-slave approach, achieving speedups up to 30.3 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture. SUMMARY:A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure PRIORITY:3 END:VEVENT END:VCALENDAR BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20121114T223000Z DTEND:20121114T230000Z LOCATION:355-EF DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload resulting low parallel efficiency.=0A=0AWe present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory.=0A=0AUsing data sets containing several hundred million high-dimensional points, we show that PDSDBCAN significantly outperforms the master-slave approach, achieving speedups up to 30.3 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture. SUMMARY:A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure PRIORITY:3 END:VEVENT END:VCALENDAR