JUNE 18–22, 2017
FRANKFURT AM MAIN, GERMANY

Session Details

 
Name: BoF 11: Accelerating Big Data Processing System Software on Modern HPC Clusters
 
Time: Tuesday, June 20, 2017
01:45 pm - 02:45 pm
 
Room:   Kontrast  
 
Speaker:   Richard Graham, Mellanox
  Francis Lam, Huawei
  Xiaoyi Lu, Ohio State University
  Yutong Lu, NUDT
  Dhabaleswar K. Panda, Ohio State University
  John Shalf, LBNL
 
Abstract:   Many Big Data processing system software are gaining momentum in the industry. Apache Hadoop and Spark have become as standard tools in handling Big Data and analytics in IT companies. Similarly, Memcached in Web-2.0 environment is becoming important for large-scale query processing. Recent studies have shown that the current-generation Hadoop, Spark, and Memcached can not leverage the high-performance networking and storage architectures on modern HPC clusters efficiently, like Remote Direct Memory Access (RDMA) enabled high-performance interconnects and heterogeneous and high-speed storage systems (e.g. HDD, SSD, NVMe-SSD, NVRAM, and Lustre). These system software are traditionally written with sockets and do not deliver the best performance on modern high-performance networks. In this BoF, we will organize several talks to give an in-depth overview of the architecture of popular Big Data processing system software (e.g., Hadoop, Spark, Flink, Memcached, etc.). All the speakers and the audience will be involved to identify the most critical challenges currently facing the community in re-designing the internal components of these system software with modern interconnects, protocols (such as InfiniBand, iWARP, and RoCE) with RDMA, accelerators, and storage architectures. We will also solicit all kinds of feedback from the community to come up with a roadmap for the next 5–10 years about how to efficiently handle these grand challenges associated with Big Data processing over modern HPC clusters.

Targeted Audience
This BoF is targeted for various categories of people working in the areas of HPC and Big Data. The specific audience is aimed at include: - Scientists, engineers, researchers, and students engaged in designing next-generation Big Data system software and applications - Designers and developers of high-performance Big Data system software, such as Hadoop, Spark, and Memcached - Newcomers to the field of Big Data who are interested in familiarizing themselves with system software, RDMA, high-performance networking and storage, accelerator, etc. - Managers and administrators responsible for setting up next generation Big Data environment and high-end systems in their organizations/laboratories