BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200227T164241Z
LOCATION:Substanz 1\, 2
DTSTART;TZID=Europe/Stockholm:20190618T083000
DTEND;TZID=Europe/Stockholm:20190618T100000
UID:isc_hpc_ISC High Performance 2019_sess182_post111@linklings.com
SUMMARY:(RP28) Performance Tuning of Deep Learning Framework Chainer on th
e K Computer.
DESCRIPTION:Research Poster\n\n(RP28) Performance Tuning of Deep Learning
Framework Chainer on the K Computer.\n\nKuroda, Kumahata, Chiba, Takashina
, Minami\n\nRecently the applications and research of machine learning by
deep learning has become popular using GPU. However, it seems possible to
do many calculations using CPUs of massively parallel computers. Here, we
introduce some performance tuning procedures for Chainer, which is a repre
sentative framework for utilization of machine learning on the K computer.
\nChainer expresses the hierarchical structure of deep learning using Pyth
on, and all calculations can be realized using numPy without special libra
ries. By optimizing floating point underflow exception when building Pytho
n, elapsed time was improved to 1/3.39. Moreover, by replacing the SSL2 ge
mm library called by Python with the thread-parallel version, section elap
sed time was improved to 1/4.54, the total elapsed time was improved to 1/
1.15, and the performance efficiency was improved about 47.0%.\nMany of th
e cost was the calculation of the square root and the arithmetic when the
filter was updated and activation functions. These operations are not opti
mized when calculated using numPy and are particularly slow on the K compu
ter. By replacing the kernel with software pipelining and SIMD optimizatio
n by Fortran library, the kernel elapsed time was improved to 1/11.08 and
total elapsed time was improved to 1/16.23.\nThere are some limitations on
the use of Chainer on the K computer. However, it can be said that deep l
earning calculation became possible on the K computer and the Post-K compu
ter using these tuning effect and CPU parallel version Chainer.\n\nPasses:
Conference Pass, AI/Machine Learning/Deep Learning, Performance Analysis
and Optimization\n\nTag: Conference Pass, AI/Machine Learning/Deep Learnin
g, Performance Analysis and Optimization
URL:https://2019.isc-program.com/presentation/?id=post111&sess=sess182
END:VEVENT
END:VCALENDAR