JUNE 18–22, 2017

Session Details

Name: Tutorial 11: InfiniBand & High-Speed Ethernet: Advanced Features, Challenges in Designing HEC Systems & Usage
Time: Sunday, June 18, 2017
02:00 pm - 06:00 pm
Room:   Konstant
Messe Frankfurt
Breaks:04:00 pm - 04:30 pm Coffee Break
Presenter:   Dhabaleswar K. Panda, Ohio State University
  Hari Subramoni, Ohio State University
Abstract:   As InfiniBand (IB) and High-Speed Ethernet (HSE) technologies mature, they are being used to design and deploy different kinds of High-End Computing (HEC) systems: HPC clusters with accelerators (GPGPUs and Xeon Phi) supporting MPI, Storage and Parallel File Systems, Cloud Computing systems with SR-IOV Virtualization, Big Data systems with Hadoop (HDFS, MapReduce and HBase) and Spark, Multi-tier Datacenters with Web 2.0 (memcached), Deep Learning middleware and Grid Computing systems. These systems are bringing new challenges in terms of performance, scalability, portability, reliability and network congestion. Many scientists, engineers, researchers, managers and system administrators are becoming interested in learning about these challenges, approaches being used to solve these challenges, and the associated impact on performance and scalability. This tutorial will start with an overview of these systems. Advanced hardware and software features of IB, HSE and RoCE and their capabilities to address these challenges will be emphasized. Next, we will focus on RDMA programming (OpenFabrics and Libfabrics), and network management infrastructure and tools to effectively use these systems. A common set of challenges being faced while designing these systems will be presented. Finally, case studies focusing on domain-specific challenges in designing these systems, their solutions/sample performance numbers will be presented