Deep Learning on Biowulf

Quick Links

On this page

Related

Multi-node/GPU Deep Learning

TensorBoard visualization tool

Natural Language Processing example (BERT)

Deep learning frameworks such as Tensorflow, Keras, and Pytorch are available through the centrally installed python module. Additionally, Flux is available through the centrally installed julia module. Other frameworks such as MXNET can be installed using a user's personal conda environment. This page will guide you through the use of the different deep learning frameworks in Biowulf using interactive sessions and sbatch submission (and by extension swarm jobs).

For each framework, a python interpreter is used to import the library and do simple commands related to the framework. In addition, a github repository of the framework's tutorial is cloned and example codes, usually basic image classification training such as CIFAR10 or MNIST, are run using the github script.

Important Notes

Module Name: python, julialang or R, depending on the framework to be used (see the modules page for more information)
Allocate a GPU node (such as the K20x, K80, P100, or V100 nodes). Please use the freen command to check GPU availability.
The NIH HPC staff provides these quickstart guides as a convenience and makes a best effort to keep them updated. But Deep Learning development moves quickly and users are encouraged to review primary documentation published by framework developers.

Tensorflow (Python)

Main page: www.tensorflow.org
Github page: https://github.com/tensorflow/tensorflow
Tutorial page: https://www.tensorflow.org/get_started/

Allocate an interactive session and run the program. Sample session:

[user@biowulf ~]$ sinteractive --gres=gpu:k80:1,lscratch:10 --mem=20g -c14
salloc.exe: Pending job allocation 58344035
salloc.exe: job 58344035 queued and waiting for resources
salloc.exe: job 58344035 has been allocated resources
salloc.exe: Granted job allocation 58344035
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn4172 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

[user@cn4172 ~]$ module load  cuDNN/8.2.1/CUDA-11.3 python/3.9
[+] Loading cuDNN/8.2.1/CUDA-11.3 libraries...
[+] Loading python 3.9  ...

[user@cn4172 ~]$ mkdir -p /data/${USER}/deeplearning/tensorflow-example

[user@cn4172 ~]$ cd /data/${USER}/deeplearning/tensorflow-example

[user@cn4172 tensorflow-example]$ python
Python 3.7.5 (default, Oct 25 2019, 15:51:11)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import tensorflow as tf

>>> print("TensorFlow version: {}".format(tf.__version__))
TensorFlow version: 2.1.0

>>> quit()

[user@cn4172 tensorflow-example]$ cat >hello-tflow.py<<'EOF'
import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
EOF

[user@cn4172 tensorflow-example]$ python hello-tflow.py
2020-05-19 12:25:29.207263: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-19 12:25:29.238367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:84:00.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.92GiB deviceMemoryBandwidth: 223.96GiB/s
2020-05-19 12:25:29.239507: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
[...]
10000/10000 [==============================] - 1s 55us/sample - loss: 0.0787 - accuracy: 0.9762

[user@cn4172 tensorflow-example]$ exit
exit
salloc.exe: Relinquishing job allocation 58344035

Now let's submit the job to Biowulf's Slurm scheduler using sbatch.

[user@biowulf ~]$ cd /data/${USER}/deeplearning/tensorflow-example

[user@biowulf tensorflow-example]$ cat >submit.sh<<'EOF'
#!/bin/bash
module load cuDNN/7.6.5/CUDA-10.1 python/3.7
python hello-tflow.py
EOF

[user@biowulf tensorflow-example]$ sbatch --partition=gpu --gres=gpu:k80:1,lscratch:10 --mem=20g -c14 submit.sh
58345698

[user@biowulf tensorflow-example]$ tail -n 1 slurm-58345698.out
10000/10000 [==============================] - 1s 55us/sample - loss: 0.0754 - accuracy: 0.9769

[user@biowulf tensorflow-example]$

Please see the tensorboard page for visualization of tensorflow training.

Flux (Julia)

Main page: https://fluxml.ai/
Github page: https://github.com/FluxML/Flux.jl
Tutorial page: https://fluxml.ai/tutorials.html

Allocate an interactive session and run the program. Sample session:

[user@biowulf ~]$ sinteractive --gres=gpu:k80:1,lscratch:10 --mem=20g -c14
salloc.exe: Pending job allocation 58344035
salloc.exe: job 58344035 queued and waiting for resources
salloc.exe: job 58344035 has been allocated resources
salloc.exe: Granted job allocation 58344035
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn4172 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

[user@cn4172 ~]$ module load cuDNN/8.2.1/CUDA-11.3 julialang/1.9.2
[+] Loading cuDNN/8.2.1/CUDA-11.3 libraries...
[+] Loading julialang 1.9.2  ...

[user@cn4172 ~]$ mkdir -p /data/${USER}/deeplearning/flux-example

[user@cn4172 ~]$ cd /data/${USER}/deeplearning/flux-example

[user@cn4172 flux-example]$ julia
              _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0 (2020-08-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> ]
(v1.5) pkg> add CUDA
   Updating registry at `~/.julia/registries/General`
######################################################################## 100.0%
  Resolving package versions...
[...]

  [052768ef] + CUDA v2.3.0

(v1.5) pkg> add Flux
[...]

(v1.5) pkg add Parameters
[...]

(v1.5) pkg add MLDatasets
[...]

(v1.5) pkg add Statistics
[...]

(v1.5) pkg> status
Status `/home/user/.julia/environments/v1.5/Project.toml`
  [052768ef] CUDA v2.3.0
  [587475ba] Flux v0.11.2
  [eb30cadb] MLDatasets v0.5.3
  [10745b16] Statistics
  [d96e819e] Parameters v0.12.1

julia> exit()

[user@cn4172 mnist]$ git clone https://github.com/FluxML/model-zoo.git
Cloning into 'model-zoo'...
remote: Enumerating objects: 166, done.
remote: Counting objects: 100% (166/166), done.
remote: Compressing objects: 100% (134/134), done.
remote: Total 2938 (delta 66), reused 90 (delta 27), pack-reused 2772
Receiving objects: 100% (2938/2938), 1.96 MiB | 12.63 MiB/s, done.
Resolving deltas: 100% (1498/1498), done.

[user@cn4172 mnist]$ cd model-zoo/vision/mnist

[user@cn4172 mnist]$ julia mlp.jl 
[ Info: CUDA is on
┌ Warning: Your Tesla K80 GPU does not meet the minimal required compute capability (3.7.0 < 5.0).
│ Some functionality might not work. For a fully-supported set-up, please use an older version of CUDA.jl
└ @ CUDA ~/.julia/packages/CUDA/YeS8q/src/state.jl:251
[ Info: Epoch 1
loss_all(train_data, m) = 2.3814766f0
loss_all(train_data, m) = 2.3615136f0
loss_all(train_data, m) = 2.342074f0
loss_all(train_data, m) = 2.323052f0
loss_all(train_data, m) = 2.3044188f0

[...]

loss_all(train_data, m) = 0.31868845f0
loss_all(train_data, m) = 0.31832847f0
loss_all(train_data, m) = 0.31800961f0
accuracy(train_data, m) = 0.9143038093777877
accuracy(test_data, m) = 0.9177315848214287

[user@cn4172 mnist]$ exit
exit
salloc.exe: Relinquishing job allocation 58344035

Now let's submit the Flux job to Biowulf's Slurm scheduler using sbatch.

[user@biowulf ~]$ cd /data/${USER}/deeplearning/flux-example/model-zoo/vision/mnist

[user@biowulf mnist]$ cat >submit.sh<<'EOF'
#!/bin/bash
module load cuDNN/7.6.5/CUDA-10.1 julialang/1.5.0
julia mlp.jl
EOF

[user@biowulf mnist]$ sbatch --partition=gpu --gres=gpu:k80:1,lscratch:10 --mem=20g -c14 submit.sh
58345698

[user@biowulf mnist]$ tail -n 2 slurm-58345698.out
accuracy(train_data, m) = 0.9135415505129348
accuracy(test_data, m) = 0.9174386160714286

[user@biowulf mnist]$

Keras (Python)

Main page: https://keras.io/
Github page: https://github.com/keras-team/keras
Tutorial page: https://keras.io/#getting-started-30-seconds-to-keras

Allocate an interactive session and run the program. Sample session:

[user@biowulf ~]$ sinteractive --gres=gpu:p100:1,lscratch:10 --mem=20g -c14
salloc.exe: Pending job allocation 45138535
salloc.exe: job 45138535 queued and waiting for resources
salloc.exe: job 45138535 has been allocated resources
salloc.exe: Granted job allocation 45138535
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn2350 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

[user@cn2350 ~]$ module load python/3.9
[+] Loading python 3.9  ...

[user@cn2350 ~]$ mkdir -p /data/$USER/deeplearning

[user@cn2350 ~]$ cd /data/$USER/deeplearning

[user@cn2350 deeplearning]$ python
Python 3.7.5 (default, Oct 25 2019, 15:51:11)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from keras.models import Sequential
Using TensorFlow backend.

>>> model = Sequential()

>>> from keras.layers import Dense

>>> model.add(Dense(units=64, activation='relu', input_dim=100))
WARNING: Logging before flag parsing goes to stderr.
W1227 18:04:50.156235 46912496418496 deprecation.py:506] From /usr/local/Anaconda/envs/py3.7/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

>>> model.add(Dense(units=10, activation='softmax'))

>>> quit()

[user@cn2350 deeplearning]$ git clone https://github.com/keras-team/keras.git
Cloning into 'keras'...
remote: Enumerating objects: 32987, done.
remote: Total 32987 (delta 0), reused 0 (delta 0), pack-reused 32987
Receiving objects: 100% (32987/32987), 13.02 MiB | 22.88 MiB/s, done.
Resolving deltas: 100% (24114/24114), done.

[user@cn2350 deeplearning]$ cd keras/ && git checkout 7a39b6c6 && cd .. #ensure version for tutorial
Note: checking out '7a39b6c6'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

  HEAD is now at 7a39b6c... Fix too many values to unpack error (#13511)

[user@cn2350 deeplearning]$ python keras/examples/mnist_cnn.py
Using TensorFlow backend.
[...]
Epoch 12/12
60000/60000 [==============================] - 3s 50us/step - loss: 0.0262 - accuracy: 0.9922 - val_loss: 0.0241 - val_accuracy: 0.9917
Test loss: 0.02409471721524369
Test accuracy: 0.9916999936103821

[user@cn2350 deeplearning]$ cat > submit.sh <<'EOF'
#!/bin/bash
module load python/3.7
python keras/examples/cifar10_resnet.py
EOF 

[user@cn2350 deeplearning]$ sbatch --partition=gpu --gres=gpu:k80:1,lscratch:10 --mem=20g -c14 submit.sh
45138623

[user@cn2350 deeplearning]$ exit
exit
salloc.exe: Relinquishing job allocation 45138535

[user@biowulf ~]$

Keras (R)

Main page: https://keras.rstudio.com/
Github page: https://github.com/rstudio/keras

Allocate an interactive session and run the program. Sample session:

[user@biowulf ~]$ sinteractive --gres=gpu:k80:1,lscratch:10 --mem=20g -c14
salloc.exe: Pending job allocation 45255667
salloc.exe: job 45255667 queued and waiting for resources
salloc.exe: job 45255667 has been allocated resources
salloc.exe: Granted job allocation 45255667
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn4175 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

[user@cn4175 ~]$ module load cuDNN/7.6.5/CUDA-10.2 CUDA/10.2 R/3.6.1 python/3.7
[+] Loading cuDNN/7.6.5/CUDA-10.2 libraries...
[+] Loading CUDA Toolkit  10.2.89  ...
[+] Loading gcc  9.2.0  ...
[+] Loading GSL 2.6 for GCC 9.2.0 ...
[-] Unloading gcc  9.2.0  ...
[+] Loading gcc  9.2.0  ...
[+] Loading openmpi 3.1.4  for GCC 9.2.0
[+] Loading ImageMagick  7.0.8  on cn4175
[+] Loading HDF5  1.10.4
[+] Loading pandoc  2.9.1  on cn4175
[+] Loading R 3.6.1
[+] Loading python 3.7  ...

[user@cn4175 ~]$ mkdir -p /data/$USER/deeplearning/R

[user@cn4175 ~]$ cd /data/$USER/deeplearning/R

[user@cn4175 R]$ R

R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(keras)

> library(tensorflow)

> model <- keras_model_sequential() 

> quit()
Save workspace image? [y/n/c]: n

[user@cn4175 R]$ git clone https://github.com/rstudio/keras.git
Cloning into 'keras'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 29481 (delta 0), reused 1 (delta 0), pack-reused 29475
Receiving objects: 100% (29481/29481), 28.35 MiB | 33.41 MiB/s, done.
Resolving deltas: 100% (25960/25960), done.

[user@cn4175 R]$ cd keras/ && git checkout e3f62ae2 && cd .. #force specific commit
Note: checking out 'e3f62ae2'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at e3f62ae... Merge pull request #955 from dfalbel/vae-examples

[user@cn4175 R]$ Rscript keras/vignettes/examples/mnist_cnn.R
x_train_shape: 60000 28 28 1
60000 train samples
10000 test samples
[...]
2019-12-30 19:33:06.852550: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-12-30 19:33:06.890161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:8a:00.0
2019-12-30 19:33:06.891298: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
[...]
Epoch 12/12
48000/48000 [==============================] - 7s 153us/sample - loss: 0.0262 - acc: 0.9916 - val_loss: 0.0398 - val_acc: 0.9907
Test loss: 0.03063129
Test accuracy: 0.9907

[user@cn4175 R]$ cat > submit.sh <<'EOF'
#!/bin/bash
module load cuDNN/7.6.5/CUDA-10.2 CUDA/10.2 R/3.6.1 python/3.7
cd /data/$USER/deeplearning/R
Rscript keras/vignettes/examples/mnist_cnn.R
EOF 

[user@cn4175 R]$ sbatch --partition=gpu --gres=gpu:k80:1,lscratch:10 --mem=20g -c14 submit.sh
45256090

[user@cn4175 R]$ exit
exit
salloc.exe: Relinquishing job allocation 45255667

Pytorch (Python)

Main page: https://pytorch.org/
Github page: https://github.com/pytorch/pytorch
Tutorial page: https://pytorch.org/tutorials/

Allocate an interactive session with X11 forwarding and run the program. Sample session:

[user@biowulf ~]$ sinteractive --gres=gpu:k80:1,lscratch:10 --mem=20g -c14
salloc.exe: Pending job allocation 45391577
salloc.exe: job 45391577 queued and waiting for resources
salloc.exe: job 45391577 has been allocated resources
salloc.exe: Granted job allocation 45391577
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn4210 are ready for job

[user@cn4210 ~]$ xeyes #make sure that x11 is enabled and working
^C

[user@cn4210 ~]$ module load python/3.7
[+] Loading python 3.7  ... 

[user@cn4210 ~]$ mkdir -p /data/$USER/deeplearning

[user@cn4210 ~]$ cd /data/$USER/deeplearning

[user@cn4210 deeplearning]$ python
Python 3.7.5 (default, Oct 25 2019, 15:51:11) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from __future__ import print_function

>>> import torch

>>> x = torch.empty(5, 3)

>>> print(x)
tensor([[1.4013e-45, 7.0371e+28, 0.0000e+00],
        [0.0000e+00, 1.9247e+13, 3.0611e-41],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [1.3452e-43, 0.0000e+00, 0.0000e+00]])

>>> x = torch.rand(5, 3)

>>> print(x)
tensor([[0.3291, 0.7965, 0.2630],
        [0.3921, 0.4740, 0.3053],
        [0.3313, 0.5913, 0.1922],
        [0.3985, 0.6349, 0.9997],
        [0.3966, 0.3017, 0.4237]])

>>> quit()

[user@cn4210 deeplearning]$ git clone https://github.com/pytorch/tutorials.git
Cloning into 'tutorials'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 28364 (delta 0), reused 2 (delta 0), pack-reused 28361
Receiving objects: 100% (28364/28364), 563.09 MiB | 40.53 MiB/s, done.
Resolving deltas: 100% (20031/20031), done.

[user@cn4210 deeplearning]$ cd tutorials/ && git checkout c87836d4 && cd .. #ensure same starting point
Note: checking out 'c87836d4'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at c87836d... Merge pull request #765 from pytorch/pr-run-options

[user@cn4210 deeplearning]$ python tutorials/beginner_source/blitz/cifar10_tutorial.py
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
170500096it [00:02, 71187107.19it/s]                                                
Files already downloaded and verified
plane horse  ship  bird
[1,  2000] loss: 2.233
[1,  4000] loss: 1.887
[1,  6000] loss: 1.672
[1,  8000] loss: 1.579
[1, 10000] loss: 1.510
[1, 12000] loss: 1.484
[2,  2000] loss: 1.389
[2,  4000] loss: 1.367
[2,  6000] loss: 1.344
[2,  8000] loss: 1.306
[2, 10000] loss: 1.313
[2, 12000] loss: 1.256
Finished Training
GroundTruth:    cat  ship  ship plane
Predicted:    cat  ship  ship  ship
Accuracy of the network on the 10000 test images: 55 %
Accuracy of plane : 52 %
Accuracy of   car : 71 %
Accuracy of  bird : 23 %
Accuracy of   cat : 29 %
Accuracy of  deer : 47 %
Accuracy of   dog : 45 %
Accuracy of  frog : 76 %
Accuracy of horse : 52 %
Accuracy of  ship : 79 %
Accuracy of truck : 72 %
cuda:0

[user@cn4210 deeplearning]$ cat > submit.sh <<'EOF'
#!/bin/bash 
module load python/3.7
python tutorials/beginner_source/blitz/cifar10_tutorial.py
EOF

[user@cn4210 deeplearning]$ sbatch --partition=gpu --gres=gpu:k80:1,lscratch:10 --mem=20g -c14 submit.sh
45392901

[user@cn4210 deeplearning]$ exit
exit
salloc.exe: Relinquishing job allocation 45391577

[user@biowulf ~]$

MXNET (Python)

MXNET can be installed through a user's conda environment (see NIH HPC Python).
Main page: https://mxnet.apache.org/
Github page: https://github.com/apache/incubator-mxnet
Tutorial page: https://mxnet.incubator.apache.org/tutorials/

Allocate an interactive session and run the program. Sample session:

[user@biowulf ~]$ sinteractive --gres=gpu:k80:1,lscratch:10 --mem=20g -c14
salloc.exe: Pending job allocation 45394059
salloc.exe: job 45394059 queued and waiting for resources
salloc.exe: job 45394059 has been allocated resources
salloc.exe: Granted job allocation 45394059
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn4187 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

[user@cn4187 ~]$ mkdir -p /data/$USER/deeplearning

[user@cn4187 ~]$ cd /data/$USER/deeplearning

[user@cn4187 deeplearning]$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
--2020-01-02 12:09:44--  https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
Proxy request sent, awaiting response... 200 OK
Length: 71785000 (68M) [application/x-sh]
Saving to: ‘Miniconda3-latest-Linux-x86_64.sh’

100%[============================================================>] 71,785,000  54.1MB/s   in 1.3s

2020-01-02 12:09:46 (54.1 MB/s) - ‘Miniconda3-latest-Linux-x86_64.sh’ saved [71785000/71785000]


[user@cn4187 deeplearning]$ bash Miniconda3-latest-Linux-x86_64.sh -p /data/$USER/deeplearning/conda -b
PREFIX=/data/user/deeplearning/conda
Unpacking payload ...
Collecting package metadata (current_repodata.json): done
Solving environment: done
[...]
installation finished.

[user@cn4187 deeplearning]$ source /data/$USER/deeplearning/conda/etc/profile.d/conda.sh

[user@cn4187 deeplearning]$ conda activate base

(base)
[user@cn4187 deeplearning]$ which python #ensure that you are using your local installation
/data/user/deeplearning/conda/bin/python

(base)
[user@cn4187 deeplearning]$ module load CUDA/10.1
[+] Loading CUDA Toolkit  10.1.105  ...

(base)
[user@cn4187 deeplearning]$ pip install mxnet-cu101
Collecting mxnet-cu101
[...]
Successfully installed graphviz-0.8.4 mxnet-cu101-1.5.1.post0 numpy-1.18.0

(base)
[user@cn4187 deeplearning]$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import mxnet as mx

>>> a = mx.nd.ones((2, 3), mx.gpu())

>>> b = a * 2 + 1

>>> b.asnumpy()
array([[3., 3., 3.],
       [3., 3., 3.]], dtype=float32)

>>> quit()

(base)
[user@cn4187 deeplearning]$ git clone https://github.com/apache/incubator-mxnet.git
Cloning into 'incubator-mxnet'...
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 109119 (delta 1), reused 0 (delta 0), pack-reused 109114
Receiving objects: 100% (109119/109119), 74.94 MiB | 30.01 MiB/s, done.
Resolving deltas: 100% (73917/73917), done.
Checking out files: 100% (4068/4068), done.

(base)
[user@cn4187 deeplearning]$ cd incubator-mxnet/example/image-classification/

(base)
[user@cn4187 image-classification]$ git checkout 06aec8aa #ensure the same starting point
Note: checking out '06aec8aa'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 06aec8a... [CI] Re-enable testing with numpy 1.18 (#17200)

(base)
[user@cn4187 image-classification]$ python train_mnist.py --network mlp --num-epochs 2
INFO:root:start with arguments Namespace(add_stn=False, batch_size=64, disp_batches=100, dtype='float32', gc_threshold=0.5, gc_type='none', gpus=None, image_shape='1, 28, 28', initializer='default', kv_store='device', load_epoch=None, loss='', lr=0.05, lr_factor=0.1, lr_step_epochs='10', macrobatch_size=0, model_prefix=None, mom=0.9, monitor=0, network='mlp', num_classes=10, num_epochs=2, num_examples=60000, num_layers=None, optimizer='sgd', profile_server_suffix='', profile_worker_suffix='', save_period=1, test_io=0, top_k=0, use_imagenet_data_augmentation=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001)
[...]
INFO:root:Epoch[1] Batch [800-900]      Speed: 46124.04 samples/sec     accuracy=0.965156
INFO:root:Epoch[1] Train-accuracy=0.966768
INFO:root:Epoch[1] Time cost=1.310
INFO:root:Epoch[1] Validation-accuracy=0.962779

(base)
[user@cn4187 image-classification]$ cat > submit.sh <<'EOF'
#!/bin/bash
module load CUDA/10.1
source /data/$USER/deeplearning/conda/etc/profile.d/conda.sh
conda activate base
python train_mnist.py --network mlp --num-epochs 2
EOF

(base)
[user@cn4187 image-classification]$ sbatch --partition=gpu --gres=gpu:k80:1,lscratch:10 --mem=20g -c14 submit.sh
45394383

(base)
[user@cn4187 image-classification]$ exit
exit
salloc.exe: Relinquishing job allocation 45394059

[user@biowulf ~]$

Knet (Julia)

Main page: https://denizyuret.github.io/Knet.jl/latest/
Github page: https://github.com/denizyuret/Knet.jl
Tutorial page: https://github.com/denizyuret/Knet.jl/tree/master/tutorial

Allocate an interactive session and run the program. (Note: The example below was obtained from the Quickstart Tutorial by Deniz Yuret at https://github.com/denizyuret/Knet.jl/blob/master/tutorial/15.quickstart.ipynb

[user@biowulf ~]$ sinteractive --gres=gpu:k80:1,lscratch:10 --mem=20g -c14
salloc.exe: Pending job allocation 58344035
salloc.exe: job 58344035 queued and waiting for resources
salloc.exe: job 58344035 has been allocated resources
salloc.exe: Granted job allocation 58344035
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn4172 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

[user@cn4172 ~]$ module load cuDNN/7.6.5/CUDA-10.1 julialang/1.5.0
[+] Loading cuDNN/7.6.5/CUDA-10.1 libraries...
[+] Loading julialang 1.5.0  ...

[user@cn4172 ~]$ mkdir -p /data/${USER}/deeplearning/knet-example

[user@cn4172 ~]$ cd /data/${USER}/deeplearning/knet-example

[user@cn4172 knet-example]$ julia
              _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0 (2020-08-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> ]
(v1.5) pkg> add Knet
[...]

(v1.5) pkg add MLDatasets
[...]

(v1.5) pkg add IterTools
[...]

(v1.5) pkg> status
Status `/home/user/.julia/environments/v1.5/Project.toml`
  [c8e1da08] IterTools v1.3.0
  [1902f260] Knet v1.4.5
  [eb30cadb] MLDatasets v0.5.3

julia> using Knet, MLDatasets, IterTools
[ Info: Precompiling Knet [1902f260-5fb4-5aff-8c31-6271790ab950]
Downloading artifact: CUDA110
Downloading artifact: CUDNN_CUDA110
Downloading artifact: CUTENSOR_CUDA110
[ Info: Precompiling MLDatasets [eb30cadb-4394-5ae3-aed4-317e484a6458]
[ Info: Precompiling IterTools [c8e1da08-722c-5040-9ed9-7db0dc04731e]

julia> struct Conv; w; b; f; end
julia> (c::Conv)(x) = c.f.(pool(conv4(c.w, x) .+ c.b))
julia> Conv(w1,w2,cx,cy,f=relu) = Conv(param(w1,w2,cx,cy), param0(1,1,cy,1), f);
julia> struct Dense; w; b; f; end
julia> (d::Dense)(x) = d.f.(d.w * mat(x) .+ d.b)
julia> Dense(i::Int,o::Int,f=relu) = Dense(param(o,i), param0(o), f);
julia> struct Chain; layers; Chain(args...)=new(args); end
julia> (c::Chain)(x) = (for l in c.layers; x = l(x); end; x)
julia> (c::Chain)(x,y) = nll(c(x),y)
julia> xtrn,ytrn = MNIST.traindata(Float32); ytrn[ytrn.==0] .= 10
julia> xtst,ytst = MNIST.testdata(Float32);  ytst[ytst.==0] .= 10
julia> dtrn = minibatch(xtrn, ytrn, 100; xsize=(size(xtrn,1),size(xtrn,2),1,:))
julia> dtst = minibatch(xtst, ytst, 100; xsize=(size(xtst,1),size(xtst,2),1,:));
julia> LeNet = Chain(Conv(5,5,1,20), Conv(5,5,20,50), Dense(800,500), Dense(500,10,identity))
julia> progress!(adam(LeNet, ncycle(dtrn,10)))
[100.00%, 6000/6000, 01:10/01:10, 85.94i/s]
julia> accuracy(LeNet, dtst)
┌ Warning: accuracy(model,data; o...) is deprecated, please use accuracy(model; data=data, o...)
└ @ Knet.Ops20 ~/.julia/packages/Knet/C0PoK/src/ops20/loss.jl:205
0.9903

julia> exit()

[user@cn4172 knet-example]$ exit
exit
salloc.exe: Relinquishing job allocation 58344035

Now let's submit the Knet example above as a job to Biowulf's Slurm scheduler using sbatch.

[user@biowulf]$ cat >submit.sh<<'EOF'
#!/bin/bash
module load cuDNN/7.6.5/CUDA-10.1 julialang/1.5.0
julia knet-example.jl
EOF

[user@biowulf]$ sbatch --partition=gpu --gres=gpu:k80:1,lscratch:10 --mem=20g -c14 submit.sh
58345698

[user@biowulf]$ tail -n 2 slurm-58345698.out
[100.00%, 6000/6000, 01:00/01:00, 99.31i/s] 
┌ Warning: accuracy(model,data; o...) is deprecated, please use accuracy(model; data=data, o...)
└ @ Knet.Ops20 ~/.julia/packages/Knet/C0PoK/src/ops20/loss.jl:205

[user@biowulf]$

Caffe2 (Python)

NOTE: Caffe2 is not available for python >=3 on Biowulf and the source code is now part of the pytorch Github repository. This is therefore a legacy example.

Main page: https://caffe2.ai/
Github page: https://github.com/pytorch/pytorch
Tutorial page: https://caffe2.ai/docs/tutorials.html

Allocate an interactive session and run the program. Sample session:

[user@biowulf ~]$ sinteractive --gres=gpu:k80:1,lscratch:10 --mem=20g -c14 
salloc.exe: Pending job allocation 45397884
salloc.exe: job 45397884 queued and waiting for resources
salloc.exe: job 45397884 has been allocated resources
salloc.exe: Granted job allocation 45397884
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn4210 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

[user@cn4210 ~]$ module load python/2.7 CUDA/10.2
[+] Loading python 2.7  ...
--------------------------------------------------------------------------------
Support for Python 2.7 will officially end January 1, 2020 though one more
release is planned for mid April 2020. See

  https://www.python.org/dev/peps/pep-0373/

Therefore, on April 15, 2020 Python 3 will become the default Python module
on biowulf. The python/2.7 module will continue to be available after this
date but not as the default.

Please update your code and workflows.
--------------------------------------------------------------------------------

[+] Loading CUDA Toolkit  10.2.89  ...

[user@cn4210 ~]$ mkdir -p /data/$USER/deeplearning

[user@cn4210 ~]$ cd /data/$USER/deeplearning

[user@cn4210 deeplearning]$ python
Python 2.7.15 |Anaconda, Inc.| (default, Oct 10 2018, 21:32:13)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> from caffe2.python import workspace, model_helper

>>> import numpy as np

>>> x = np.random.rand(4, 3, 2)

>>> print(x)
[[[0.73718637 0.47582559]
  [0.20259583 0.73612329]
  [0.43679271 0.36409541]]

 [[0.47364977 0.03001404]
  [0.51150512 0.574326  ]
  [0.29767152 0.92993194]]

 [[0.73750268 0.10525092]
  [0.29873771 0.18084976]
  [0.2605085  0.71325083]]

 [[0.96754395 0.13209623]
  [0.54818848 0.46042724]
  [0.06515803 0.53438246]]]

>>> print(x.shape)
(4, 3, 2)

>>> workspace.FeedBlob("my_x", x)
True

>>> x2 = workspace.FetchBlob("my_x")

>>> print(x2)
[[[0.73718637 0.47582559]
  [0.20259583 0.73612329]
  [0.43679271 0.36409541]]

 [[0.47364977 0.03001404]
  [0.51150512 0.574326  ]
  [0.29767152 0.92993194]]

 [[0.73750268 0.10525092]
  [0.29873771 0.18084976]
  [0.2605085  0.71325083]]

 [[0.96754395 0.13209623]
  [0.54818848 0.46042724]
  [0.06515803 0.53438246]]]

>>> quit()

[user@cn4210 deeplearning]$ git clone https://github.com/caffe2/caffe2.git
Cloning into 'caffe2'...
remote: Enumerating objects: 230451, done.
remote: Total 230451 (delta 0), reused 0 (delta 0), pack-reused 230451
Receiving objects: 100% (230451/230451), 392.83 MiB | 35.82 MiB/s, done.
Resolving deltas: 100% (211854/211854), done.

[user@cn4210 deeplearning]$ cd caffe2/ && git checkout v0.8.1 && cd .. #get latest non-deprecated release
Checking out files: 100% (1267/1267), done.
Note: checking out 'v0.8.1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 32f023f... Add conv layer and layer tests

[user@cn4210 deeplearning]$ python caffe2/caffe2/python/models/resnet_test.py
[...]
[I memonger.cc:239] Memonger saved approximately : 343.904 MB.
INFO:memonger:Memonger memory optimization took 0.0177371501923 secs
before: 880 after: 776
.
----------------------------------------------------------------------
Ran 3 tests in 19.429s

OK

[user@cn4210 deeplearning]$ cat > submit.sh <<'EOF'
#!/bin/bash
module load python/2.7 CUDA/10.2
python caffe2/caffe2/python/models/resnet_test.py
EOF

[user@cn4210 deeplearning]$ sbatch --partition=gpu --gres=gpu:k80:1,lscratch:10 --mem=20g -c14 submit.sh
45397997

[user@cn4210 deeplearning]$ exit
exit
salloc.exe: Relinquishing job allocation 45397884

[user@biowulf ~]$

Please see the multi-GPU deep learning page for training with more than 1 GPU.