Biowulf High Performance Computing at the NIH
Jupyter on Biowulf

Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

Documentation

Common pitfalls

This is a list of common pitfalls for jupyter users on the cluster. Some of these are discussed in other sections of this documentation in more detail.

PermissionError: [Errno 13] Permission denied: '/run/user/xxxx'
This can be fixed by unsetting the $XDG_RUNTIME_DIR variable:
$ unset XDG_RUNTIME_DIR    # (bash)
$ unsetenv XDG_RUNTIME_DIR # (tcsh)
The cause of this error is the $XDG_RUNTIME_DIR variable exported from the login node pointing to a non-existing directory on the compute node. This appears to be a problem under only some circumstances.
Expected packages are missing from my python kernel
Jupyter itself is installed as a conda environment and is the default kernel for new notebooks. This environment, however, does not include the commonly used scientific stack of packages. Instead, please use one of the kernels corresponding to the centrally installed applications (e.g. python/3.7).
Other python modules
No python or R modules need to be loaded. The main python and R installations are available as pre-configured kernels in our jupyter setup.
Starting a Jupyter Instance and connecting from your local browser

In order to connect to a jupyter notebook running on a compute node with the browser on your computer, it is necessary to establish a tunnel from your computer to biowulf and from biowulf to the compute node. sinteractive can automatically create the second leg of this tunnel (from biowulf to the compute node) when started with the -T/--tunnel option. For more details and information on how to set up the second part of the tunnel see our tunneling documentation.

Note that the python environment hosting the jupyter install is a minimal python environment. Please use the fully featured kernels named identically to their modules (e.g. python/3.7 or R/4.0). The kernels set up their environment so there is no need to load separate R or python modules before starting jupyter.

Allocate an interactive session and start a jupyter instance as shown below. First, we launch tmux (or screen) on the login node so that we don't lose our session if our connection to the login node drops.

[user@biowulf]$ module load tmux # You can use screen instead; you don't need to module load it
[user@biowulf]$ tmux
[user@biowulf]$ sinteractive --gres=lscratch:5 --mem=10g --tunnel
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
                                                                           
Created 1 generic SSH tunnel(s) from this compute node to                  
biowulf for your use at port numbers defined                               
in the $PORTn ($PORT1, ...) environment variables.                         
                                                                           
                                                                           
Please create a SSH tunnel from your workstation to these ports on biowulf.
On Linux/MacOS, open a terminal and run:                                   
                                                                           
    ssh  -L 33327:localhost:33327 biowulf.nih.gov                          
                                                                           
For Windows instructions, see https://hpc.nih.gov/docs/tunneling           

[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ module load jupyter
[user@cn3144]$ cp ${JUPYTER_TEST_DATA:-none}/* .
[user@cn3144]$ ls -lh
total 196K
-rw-r--r-- 1 user group 7.8K Oct  1 14:31 Pokemon.csv
-rw-r--r-- 1 user group 186K Oct  1 14:31 Seaborn_test.ipynb

[user@cn3144]$ jupyter kernelspec list
Available kernels:
  python3          /usr/local/Anaconda/envs/jupyter/lib/python3.6/site-packages/ipykernel/resources
  bash             /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/bash
  calysto_xonsh    /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/calysto_xonsh
  ir35             /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/ir35
  ir36             /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/ir36
  ir40             /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/ir40
  py2.7            /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/py2.7
  py3.6            /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/py3.6
  py3.7            /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/py3.7
  py3.8            /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/py3.8

In the example show here I will use the convenient environment variable $PORT1 which expands to the port reserved by sinteractive - 33327 in this example. You can also use the port number directly. Brief explanation of the options:

notebook|lab|console
Start a jupyter notebook, jupyter lab, or jupyter console
--ip localhost
listen on localhost only
--port $PORT1
listen on $PORT1, the variable set when using --tunnel for sinteractive
--no-browser
We're not running the browser on the compute node!
[user@cn3144]$ jupyter notebook --ip localhost --port $PORT1 --no-browser 
[I 12:48:25.645 NotebookApp] [nb_conda_kernels] enabled, 20 kernels found
[I 12:48:26.053 NotebookApp] [nb_anacondacloud] enabled
[I 12:48:26.077 NotebookApp] [nb_conda] enabled
[I 12:48:26.322 NotebookApp] ✓ nbpresent HTML export ENABLED
[W 12:48:26.323 NotebookApp] ✗ nbpresent PDF export DISABLED: No module named nbbrowserpdf.exporters.pdf
[I 12:48:26.330 NotebookApp] Serving notebooks from local directory: /spin1/users/user
[I 12:48:26.330 NotebookApp] 0 active kernels 
[I 12:48:26.330 NotebookApp] The Jupyter Notebook is running at: http://localhost:33327/?token=xxxxxxxxxx
[I 12:48:26.331 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 12:48:26.333 NotebookApp] 
    
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:33327/?token=xxxxxxxxxx

Keep this open for as long as you're using your notebook.

For documentation on how to connect a browser on your local computer to your jupyter instance on a biowulf compute node via a tunnel see our general tunneling documentation.

Selecting a kernel

In the notebook interface the kernels highlighted in red below correspond to the equally named modules on the command line interface. Note that no modules other than jupyter has to be loaded - the kernels do all the required setup:

Select notebook kernels

Similarly, in the jupyterlab interface

Select lab kernels

Kernels may change as new python or R installations are added or old ones are retired. Look for kernels with the same name as command line modules.