Agenda Planner ISC 2017 - Session Details

JUNE 18–22, 2017
FRANKFURT AM MAIN, GERMANY

Session Details

Name:		BoF 16: Scaling Up/Out Deep Learning on HPC Clusters

Time:		Wednesday, June 21, 2017 08:30 am - 09:30 am

Room:		Kontrast

Breaks:		08:00 am - 09:00 am Welcome Coffee

Speaker:		David N. Lombard, Intel
		Jun Nakajima, Intel
		Matthieu Ospici, Atos
		Karl W. Schulz, Intel

Abstract:		Deep learning techniques are increasingly used in various areas as they can equal or even surpass human-level performance for object recognition or classification problems. To reach such performance, the underlying neural network architecture must contain many layers (very deep network) and the model must be trained with a huge dataset. Consequently, the training is highly compute and I/O intensive. Furthermore, the development workflow requires many iterations to empirically evaluate the best neural network architecture. At each cycle, model training is performed, which can be time consuming (e.g. days, weeks). The use of HPC clusters equipped with accelerators (such GPU, FPGA) and low latency network is logically considered for running this kind of application, in particular during the development phase to training and improve development productivity. And scaling up/out deep learning involves different techniques for parallel processing, namely, data parallelism and model parallelism, requiring iterative synchronization across the cluster nodes. This BoF aims at tackling the usage of HPC clusters for training deep learning models with this agenda: - a brief introduction on deep learning science - an example of a deep learning application with TensorFlow - an overview of the motivations for using HPC technologies and challenges - technologies to scale up/out training of deep learning on HPC clusters - the different ways to implement a “Deep learning as a service” stack on HPC - open discussions Targeted Audience Anybody interested in large-scale deep learning, especially technical problems (e.g. various bottlenecks), solutions, and advantages of using HPC clusters. The audience would also learn popular machine learning frameworks such as TensorFlow.