Presentation

· Presenters · Organizations · Search Program

Machine Learning Day

: On Scale-out Deep Learning Training for Cloud and HPC

SessionScalable Machine Learning Systems

Speaker

Pradeep Dubey

Event Type

Machine Learning Day

Passes

Tags

TimeWednesday, June 27th1:45pm - 2:15pm

LocationPanorama 2

DescriptionThe exponential growth of Artificial Intelligence (AI) and Deep Learning (DL) has accelerated the need for training deep neural networks in hours or even minutes. This can only be achieved through scalable and efficient distributed training, since a single node/card cannot satisfy the compute, memory, and I/O requirements of today's state-of-the-art neural networks. However, scaling Stochastic Gradient Descent (SGD) is still a challenging problem and requires continued research/development. This entails innovations spanning algorithms, frameworks, communication libraries, and system design. In this talk, we describe the philosophy, design, and implementation of Intel Machine Learning Scalability Library (MLSL), support in popular DL frameworks, and present proof-points demonstrating scaling DL training on 100s to 1000s of nodes across Cloud and HPC systems.

Speaker

Pradeep Dubey

Intel Fellow, Director Parallel Computing Lab (PCL)

Intel