High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed


Interactive Deep Learning GPU Training System

The NVIDIA Deep Learning GPU Training System (DIGITS) can be used to rapidly train highly accurate deep neural networks (DNNs) for image classification, segmentation and object detection tasks.

DIGITS simplifies common deep learning tasks such as managing data, designing and training neural networks on multi-GPU systems, monitoring performance in real time with advanced visualizations, and selecting the best performing model from the results browser for deployment. DIGITS is completely interactive so that data scientists can focus on designing and training networks rather than programming and debugging.

Web sites

Running DIGITS on a GPU node
back to top

DIGITS runs through an interactive web GUI, and it also uses GPUs. So the only sensible way to run DIGITS on Biowulf is through an interactive session on a GPU node.

First, allocate an interactive session on a GPU node like so. (User input in bold.)

[user@biowulf ~]$ sinteractive --constraint=gpuk20x --gres=gpu:k20x:1
salloc.exe: Pending job allocation 35925841
salloc.exe: job 35925841 queued and waiting for resources
salloc.exe: job 35925841 has been allocated resources
salloc.exe: Granted job allocation 35925841
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0605 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

You may change these arguments to fit your specific needs. See the Biowulf user's guide for more information.

Get the IP address of the compute node. You'll need it later.

[user@cn0605 ~]$ hostname -I

Make note of the first of these two IP addresses (the one starting in 10.1.).

Next you need to define a directory for DIGITS to work in using the DIGITS_JOB_DIR environmental variable. For example, if you want to use a directory called digits in your data directory:

[user@cn0605 ~]$ mkdir /data/$USER/digits

[user@cn0605 ~]$ export DIGITS_JOBS_DIR=/data/$USER/digits

Now pick a port number (preferably greater than 50,000) and start the DIGITS server like so.

[user@cn0605 ~]$ module load digits
[+] Loading digits 5.0 on cn0616
[+] Loading singularity 2.2.1 on cn0616

[user@cn0605 ~]$ digits-devserver -p 56841
WARNING: Bind file source does not exist on host: /etc/resolv.conf
   ___ ___ ___ ___ _____ ___
  |   \_ _/ __|_ _|_   _/ __|
  | |) | | (_ || |  | | \__ \
  |___/___\___|___| |_| |___/ 5.0.0
WARNING: log_file config option not found - no log file is being saved

Now open a Firefox web browser on Biowulf using a graphics visualization client like NX or X11. In the address bar enter the ip address and the port number separated by a colon. For instance, using the ip address and port numbers from the example above, you would enter

ip and port numbers

Now you should see the interactive web GUI interface as pictured below.

ip and port numbers

Using the DIGITS Restful API
back to top

You can interact with DIGITS at the command line by using the Restful API via curl commands. When doing so, it's important to make sure that your web traffic is not going through one of the NIH HPC proxy servers. Use the following command to ensure that this is true:

[user@cn0605 ~]$ proxyoff

For a quick demonstration showing how to interact with digits via the API, expand the tab below.

Digits Restful API demo
  • space - play / pause
  • f - toggle fullscreen mode
  • arrow keys(←/→) - rewind 5 seconds / fast-forward 5 seconds
  • 0, 1, 2 ... 9 - jump to 0%, 10%, 20% ... 90%
  • copy and paste text from movie
back to top