High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Caffe2: a new lightweight, modular, and scalable deep learning framework

Caffe2 aims to provide an easy and straightforward way for you to experiment with deep learning and leverage community contributions of new models and algorithms. You can bring your creations to scale using the power of GPUs in the cloud or to the masses on mobile with Caffe2's cross-platform libraries.

Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session on CPU or GPU node(s) and run the program. Sample session on GPU node:

[user@biowulf ~]$ sinteractive --gres=gpu:k20x:1 
[user@cn3123 user]$ module load Caffe2 

[+] Loading singularity 2.4.1 on cn3123

[+] Loading Caffe2 0.8.1 …

From this point, the current installation provides 2 options:
1) command
[user@cn3123 user]$ caffe2 
will bring the user into the singularity container shell
Singularity caffe2_gpu.sqsh:~>
from which one can run any script or command accessible within the container on any data accessible from inside the container. For example, the following command runs a built-in test:
Singularity caffe2_gpu.sqsh:~> python /caffe2/caffe2/python/models/resnet_test.py 
INFO:memonger:Remapping 114 blobs, using 6 shared; saved apprx 346 MB
INFO:memonger:Could not infer sizes for: set([])
INFO:memonger:Memonger memory optimization took 0.0323729515076 secs
before: 880 after: 772
Ran 3 tests in 36.959s

2) the same command, followed by another Linux command, will allow the user to perform the second command without explicitly entering the container shell. For example:
[user@cn3123 user]$ caffe2 python /caffe2/caffe2/python/models/resnet_test.py 
Thus, the following sequence of two commands
[user@cn3123 user]$ caffe2
Singularity caffe2_gpu.sqsh:~> python
will produce the same result as a single command
[user@cn3123 user]$ caffe2 python 
Furthermore, the Caffe2 container installed on GPU nodes also includes Facebook’s Detectron application, which works together with Caffe2’ detectron module, as verified by the following test command:
Singularity caffe2_gpu.sqsh:~>python2 /Detectron/tests/test_spatial_narrow_as_op.py 
Ran 3 tests in 2.340s
To exit the container shell and the interactive session, type
Singularity caffe2_gpu.sqsh:~> exit 
[user@cn3123 user]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. caffe2.sh). For example:

set -e
module load Caffe2 
cd /data/$USER
caffe2 python /caffe2/caffe2/python/models/resnet_test.py

Submit this job using the Slurm sbatch command.

sbatch caffe2.sh