Biowulf High Performance Computing at the NIH
Jupyter on Biowulf

Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

Documentation

Starting a Jupyter Instance

This can be done by submitting either a batch or interactive job. This guide will demonstrate the latter. Note that the python environment hosting the jupyter install is a minimal python environment. Use the pyX.X kernels to get the fully featured python environments.

Allocate an interactive session and start a jupyter instance as shown below. First, we launch tmux (or screen) on the login node so that we don't lose our session if our connection to the login node drops.

[user@biowulf]$ module load tmux # You can use screen instead; you don't need to module load it
[user@biowulf]$ tmux
[user@biowulf]$ sinteractive --gres=lscratch:5
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ cp ${JUPYTER_TEST_DATA:-none}/* .
[user@cn3144]$ ls -lh
total 196K
-rw-r--r-- 1 user group 7.8K Oct  1 14:31 Pokemon.csv
-rw-r--r-- 1 user group 186K Oct  1 14:31 Seaborn_test.ipynb

[user@cn3144]$ module load jupyter
[user@cn3144]$ jupyter kernelspec list
Available kernels:
  python3          /usr/local/Anaconda/envs/jupyter/lib/python3.6/site-packages/ipykernel/resources
  bash             /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/bash
  calysto_xonsh    /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/calysto_xonsh
  ir35             /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/ir35
  py2.7            /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/py2.7
  py3.5            /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/py3.5
  py3.6            /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/py3.6
  xonsh            /usr/local/Anaconda/envs/jupyter/share/jupyter/kernels/xonsh

[user@cn3144]$ # let's pick a random port to make conflicts less likely (see note below)
[user@cn3144]$ # use the random port, *not* the port from this example
[user@cn3144]$ port=$(( RANDOM + 1024 ))
[user@cn3144]$ echo $port
1689
[user@cn3144]$ jupyter notebook --ip localhost \ # or jupyter lab or jupyter console
                        --port 1689 \ # The port must be unique to avoid clashing with other users
                        --no-browser  # We're not running the browser on the compute node!
[I 12:48:25.645 NotebookApp] [nb_conda_kernels] enabled, 20 kernels found
[I 12:48:26.053 NotebookApp] [nb_anacondacloud] enabled
[I 12:48:26.077 NotebookApp] [nb_conda] enabled
[I 12:48:26.322 NotebookApp] ✓ nbpresent HTML export ENABLED
[W 12:48:26.323 NotebookApp] ✗ nbpresent PDF export DISABLED: No module named nbbrowserpdf.exporters.pdf
[I 12:48:26.330 NotebookApp] Serving notebooks from local directory: /spin1/users/user
[I 12:48:26.330 NotebookApp] 0 active kernels 
[I 12:48:26.330 NotebookApp] The Jupyter Notebook is running at: http://localhost:1689/?token=d080ef1c11b196e2dfc77425c7c61cb150d516eafd458178
[I 12:48:26.331 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 12:48:26.333 NotebookApp] 
    
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:1689/?token=d080ef1c11b196e2dfc77425c7c61cb150d516eafd458178

Keep this open for as long as you're using your notebook. Note: In the unlikely event that this port is already in use on the compute node, please select another random port.
Connecting to Your Jupyter Instance

Connecting to your jupyter instance on a compute node with your local browser requires a tunnel. How to set up the tunnel depends on your local workstation. If the port selected above is already in use on biowulf you will get an error. Please select another random port, restart the jupyter notbook on that port, and try tunneling again.

  • Linux/Mac
  • Windows

From your local machine, do the following:

[user@workstation]$ ssh -t -L 1689:localhost:1689 biowulf.nih.gov ssh -N -L 1689:localhost:1689 cn3144
        

After entering your password, nothing will appear to happen since we are using -N but the tunnel will have been established and will persist until interrupting the connection with Ctrl-C. Leave it open until you are done working with your notebook. Now he link printed out when you invoked Jupyter above will now work if you paste it into your web browser. If your tunnel connection drops (or you accidentally close the window), you can re-create it the same way without losing anything as long as your Slurm job is still alive.

Setting up a tunnel from your desktop to a compute node with putty is a two step process. First, set up a tunnel to biowulf:

Enter biowulf.nih.gov into the Host field as usual

putty tunnel illustration 1

Then go to Tunnels, add the information show below into Source and destination, replacing the port number with the port number you used. Then click Add (1). After that you can either save the connection on the putty home screen and then click Open. Or you can click Open right away.

putty tunnel illustration 2

Once logged into biowulf, you will need to execute an ssh command to complete the tunnel to the compute node. Note, that we are using the -N option which means that you won't get a prompt on the login node but the tunnel will still be active.

[user@biowulf]$ ssh -N -L 1689:localhost:1689 cn3144
        

Now the link displayed during the jupyter notbook startup will work in your local browser.

Video Examples

These videos demonstrate the process end-to-end. See the subsequent sections for a written description of the process. Note that there is now a separate jupyter module with access to multiple kernels, including kernels for all the python environments.

Linux/Mac
Windows with PuTTY