JUNE 18–22, 2017
FRANKFURT AM MAIN, GERMANY

Presentation Details

 
Name: (RP03) A Portable Distributed Sparse Grid Density Estimation for Big Data Clustering
 
Time: Tuesday, June 20, 2017
08:35 am - 09:45 am
 
Room:   Substanz 1+2  
 
Breaks:07:30 am - 10:00 am Welcome Coffee
 
Presenter:   David Pfander, University of Stuttgart
 
Abstract:  
The clustering of data points is one of the central tasks in data mining. For Big Data scenarios with millions to billions of data points, highly-efficient algorithms are required. We present an accelerator-enabled distributed clustering algorithm. It is based on a spatial discretization using sparse grids. Our clustering algorithm uses density estimation of the dataset to prune a nearest neighbor graph of the dataset. A key benefit of the sparse grid density estimation is that it scales linearly in the size of the dataset and it is therefore well-suited for vast datasets. We have realized efficient implementations in OpenCl that efficiently exploit CPUs and accelerator cards of different vendors. First results show a good scaling behavior on 64 nodes of Piz Daint, a large Nvidia Pascal installation, for synthetic datasets with up to 10 dimensions and 10 million data points. On the node-level, we achieve between 23% and 50% of the peak performance on hardware platforms of different vendors. As we are limited to two thirds of the peak performance due to the instruction mix, we achieve up to 76% of the practically possible peak performance. Our approach displays good scalability, high node-level performance and performance portability.

Authors:
David Pfander, Universität Stuttgart
Gregor Daiß, Universität Stuttgart
Dirk Pflüger, Universität Stuttgart
 
 
Download

RP03_Pfander.pdf (14147 KB)