Quick Links
|
Apptainer is a tool allowing you to build and run Linux containers. Linux containers can be thought of as small, lightweight virtual machines encapsulating an entire operating system. Containers let users run applications in a Linux environment of their choosing.
Possible uses for Apptainer on Biowulf:
Please note, Apptainer gives you the ability to install and run applications in your own Linux environment with your own customized software stack. With this ability comes the added responsibility of managing your own Linux environment. While the NIH HPC staff can provide guidance on how to create and use Apptainer containers, we do not have the resources to manage containers for individual users. If you decide to use Apptainer, it is your responsibility to manage your own containers.
export APPTAINER_CACHEDIR=/data/${USER}/.apptainer
Like most scientific applications, Apptainer cannot be run on the Biowulf login node. To run an Apptainer container image on Biowulf interactively, you need to allocate an interactive session, and load the Apptainer module (user input in bold).
[user@biowulf ~]$ sinteractive -c4 --mem=8g --gres=lscratch:10 salloc: Pending job allocation 35492498 salloc: job 35492498 queued and waiting for resources salloc: job 35492498 has been allocated resources salloc: Granted job allocation 35492498 salloc: Waiting for resource configuration salloc: Nodes cn0991 are ready for job srun: error: x11: no local DISPLAY defined, skipping error: unable to open file /tmp/slurm-spank-x11.35492498.0 slurmstepd: error: x11: unable to read DISPLAY value [user@cn0991 ~]$ module load apptainer [+] Loading apptainer 1.0.1 on cn0991 [user@cn0991 ~]$
You can download containers using the pull command. Apptainer containers are saved on disk as image files. In this example, we use the library:// URI to download containers from the Sylabs cloud library.
[user@cn0991 35492498]$ apptainer pull library://godlovedc/funny/lolcow INFO: Downloading library image 89.2MiB / 89.2MiB [============================================================================] 100 % 18.3 MiB/s 0s WARNING: integrity: signature not found for object group 1 WARNING: Skipping container verification [user@cn0991 35492498]$ ls -lh total 90M -rwxr-xr-x 1 user user 90M Apr 4 12:27 lolcow_latest.sif [user@cn0991 35492498]$
You can also use the docker:// URI to get this container from Docker Hub. In this example, we added the --force option so that we can overwrite the container we downloaded from the Sylabs cloud library. This also produces a lot of standard output (abbreviated here) as the container is converted to Apptainer format.
[user@cn0991 35492498]$ apptainer pull --force docker://godlovedc/lolcow INFO: Converting OCI blobs to SIF format INFO: Starting build... Getting image source signatures Copying blob 9fb6c798fa41 done Copying blob 3b61febd4aef done Copying blob 9d99b9777eb0 done Copying blob d010c8cf75d7 done Copying blob 7fac07fb303e done Copying blob 8e860504ff1e done Copying config 73d5b1025f done Writing manifest to image destination Storing signatures 2022/04/04 12:35:25 info unpack layer: sha256:9fb6c798fa41e509b58bccc5c29654c3ff4648b608f5daa67c1aab6a7d02c118 2022/04/04 12:35:25 warn rootless{dev/agpgart} creating empty file in place of device 10:175 [...snip] 2022/04/04 12:35:25 warn rootless{dev/zero} creating empty file in place of device 1:5 2022/04/04 12:35:27 info unpack layer: sha256:3b61febd4aefe982e0cb9c696d415137384d1a01052b50a85aae46439e15e49a 2022/04/04 12:35:27 info unpack layer: sha256:9d99b9777eb02b8943c0e72d7a7baec5c782f8fd976825c9d3fb48b3101aacc2 2022/04/04 12:35:27 info unpack layer: sha256:d010c8cf75d7eb5d2504d5ffa0d19696e8d745a457dd8d28ec6dd41d3763617e 2022/04/04 12:35:27 info unpack layer: sha256:7fac07fb303e0589b9c23e6f49d5dc1ff9d6f3c8c88cabe768b430bdb47f03a9 2022/04/04 12:35:27 info unpack layer: sha256:8e860504ff1ee5dc7953672d128ce1e4aa4d8e3716eb39fe710b849c64b20945 INFO: Creating SIF file... [user@cn0991 35492498]$ ls -lh total 88M -rwxr-xr-x 1 user user 88M Apr 4 12:35 lolcow_latest.sif [user@cn0991 35492498]$
You can "run" your newly downloaded container either using the run command, or by treating the container as an executable file and supplying its path.
[user@cn0991 35492498]$ apptainer run lolcow_latest.sif ___________________________________ < You will triumph over your enemy. > ----------------------------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || [user@cn0991 35492498]$ ./lolcow_latest.sif _________________________________________ / It is so very hard to be an \ | on-your-own-take-care-of-yourself-becau | | se-there-is-no-one-else-to-do-it-for-yo | \ u grown-up. / ----------------------------------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || [user@cn0991 35492498]$
You can start a new shell session within the container you downloaded using the shell command. This is usefull if you want to look around and find things inside the container. Note that the operating system in the container has changed to Ubuntu 16.04 (codenamed "Xenial Xerus"). Also note that you must execute the exit command to quit the shell session within the container when you are finished.
[user@cn0991 35492498]$ apptainer shell lolcow_latest.sif Apptainer> which cowsay /usr/games/cowsay Apptainer> cowsay moo _____ < moo > ----- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || Apptainer> cat /etc/os-release NAME="Ubuntu" VERSION="16.04.3 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.3 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" VERSION_CODENAME=xenial UBUNTU_CODENAME=xenial Apptainer> exit exit [user@cn0991 35492498]$
The exec command is useful to execute a single command inside the container.
[user@cn0991 35492498]$ apptainer exec lolcow_latest.sif fortune Be security conscious -- National defense is at stake. [user@cn0991 35492498]$
Good advice.
Finally, you may find it useful to use the run, exec, or shell commands on containers you have not downloaded. In this example we use the shell command to start a shell session within the official Docker Hub alpine container, thereby switching to the Alpine operating system with a single command.
[user@cn0991 35492498]$ apptainer shell docker://alpine INFO: Converting OCI blobs to SIF format INFO: Starting build... Getting image source signatures Copying blob 40e059520d19 done Copying config 90d288e0c9 done Writing manifest to image destination Storing signatures 2022/04/04 14:08:58 info unpack layer: sha256:40e059520d199e1a1a259089077f2a0c879951c9a4540490bad3a0d7714c6ae7 INFO: Creating SIF file... Apptainer> cat /etc/os-release NAME="Alpine Linux" ID=alpine VERSION_ID=3.15.3 PRETTY_NAME="Alpine Linux v3.15" HOME_URL="https://alpinelinux.org/" BUG_REPORT_URL="https://bugs.alpinelinux.org/" Apptainer> exit [user@cn0991 35492498]$
This is a very small sample of Apptainer's vast capabilities. See the official user documentation for a thorough introduction.
Binding a directory to your Apptainer container allows you to read and write files on the host system from within your container. By default, Apptainer binds your $HOME directory (along with a few other directories such as /tmp and /dev). You can also bind other directories into your container yourself. The process is described in detail in the Apptainer documentation.
There are several filesystems on the NIH HPC systems that you may want to access from within your container. If you are running a job and have allocated local scratch space, you might like to bind your /lscratch into the container as well. You can bind directories into your container at runtime using either the --bind option or by setting the $APPTAINER_BINDPATH environment variable.
The following command opens a shell in a container while bind-mounting your /data directory, /fdb, and /lscratch into the container. If you have access to shared data directories, you could add them to the list as well (for example, /data/$USER,/data/mygroup1,/data/mygroup2,/fdb,...).
[user@cn1234 ~]$ apptainer shell --bind /data/$USER,/fdb,/lscratch my-container.sifor, using the environment variable:
[user@cn1234 ~]$ export APPTAINER_BINDPATH="/data/$USER,/fdb,/lscratch" [user@cn1234 ~]$ apptainer shell my-container.sif
The NIH HPC staff maintains a file that will set the $SINGULARITY_BINDPATH environment variable appropriately for a wide variety of situations. It is considered a Biowulf best practice to source this file since it will be updated in the case of additions or deletions to the shared file system. You can source this file from the command prompt or from within a script like so:
[user@cn1234 ~]$ . /usr/local/current/apptainer/app_conf/sing_binds
Sourcing this file instead of setting bind paths yourself can help to "future-proof" your workflow since it is maintained by NIH HPC staff.
Remember that you may need to change the directories that you bind into the container if you use your container on a different system or share it with a colleague.
To leverage the hardware on a GPU node, you must pass the --nv option with your Apptainer command. Consider the following example in which a toy deep learning model is trained on a GPU using a tensorflow container:
[user@biowulf ~]$ sinteractive --constraint=gpuk80 -c28 --mem=32g --gres=gpu:k80:1,lscratch:20 salloc: Pending job allocation 35509918 salloc: job 35509918 queued and waiting for resources salloc: job 35509918 has been allocated resources salloc: Granted job allocation 35509918 salloc: Waiting for resource configuration salloc: Nodes cn4194 are ready for job srun: error: x11: no local DISPLAY defined, skipping error: unable to open file /tmp/slurm-spank-x11.35509918.0 slurmstepd: error: x11: unable to read DISPLAY value [user@cn4194 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn4194 35509918]$ cat >hello-tflow.py<<'EOF' import tensorflow as tf mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test) EOF [user@cn4194 35509918]$ module load apptainer [+] Loading apptainer 1.0.1 on cn4194 [user@cn4194 35509918]$ . /usr/local/current/apptainer/app_conf/sing_binds [user@cn4194 35509918]$ apptainer exec --nv docker://tensorflow/tensorflow:latest-gpu python hello-tflow.py INFO: Using cached SIF image 2022-04-04 16:39:07.555227: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-04-04 16:39:09.893898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11560 MB memory: -> device: 0, name: Tesla K80, pci bus id: 0000:84:00.0, compute capability: 3.7 Epoch 1/5 1875/1875 [==============================] - 6s 2ms/step - loss: 0.2960 - accuracy: 0.9143 Epoch 2/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.1387 - accuracy: 0.9583 Epoch 3/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.1043 - accuracy: 0.9683 Epoch 4/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.0861 - accuracy: 0.9724 Epoch 5/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.0752 - accuracy: 0.9763 313/313 [==============================] - 1s 2ms/step - loss: 0.0735 - accuracy: 0.9766 [user@cn4194 35509918]$
More detailed information about GPU support in Apptainer can be obtained from the official docs.
In this example, we assume that you have the script from the GPU example above saved in your /data directory. To do so, execute the following commands:
[user@biowulf ~]$ cd /data/$USER [user@biowulf user]$ cat >hello-tflow.py<<'EOF' import tensorflow as tf mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test) EOF [user@biowulf user]$
Then write a batch script to run the apptainer command similar to this:
#!/bin/bash # file called myjob.batch set -e module load apptainer cd /data/$USER . /usr/local/current/apptainer/app_conf/sing_binds apptainer exec --nv docker://tensorflow/tensorflow:latest-gpu \ python /data/${USER}/hello-tflow.py
Submit the job like so:
[user@biowulf user]$ sbatch --time=10 --cpus-per-task=28 --partition=gpu --mem=32g --gres=gpu:k80:1 myjob.batch 35686040 [user@biowulf user]$
After the job finishes executing you should see the following output in the slurm*.out file.
[user@biowulf user]$ cat slurm-35686040.out [+] Loading apptainer 1.0.1 on cn4181 INFO: Using cached SIF image 2022-04-06 12:27:30.092624: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-04-06 12:27:37.983259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11560 MB memory: -> device: 0, name: Tesla K80, pci bus id: 0000:8b:00.0, compute capability: 3.7 Epoch 1/5 1875/1875 [==============================] - 12s 2ms/step - loss: 0.3009 - accuracy: 0.9131 Epoch 2/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.1458 - accuracy: 0.9569 Epoch 3/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.1093 - accuracy: 0.9670 Epoch 4/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.0871 - accuracy: 0.9732 Epoch 5/5 1875/1875 [==============================] - 3s 2ms/step - loss: 0.0750 - accuracy: 0.9768 313/313 [==============================] - 1s 2ms/step - loss: 0.0802 - accuracy: 0.9764 [user@biowulf user]$
The example above shows how to run a script within a containerized environment on Biowulf. This example can be easily extended to a swarm file with a series of commands. Let's assume you have a list of python scripts that you want to run in a custom python environment.
First, source the sing_binds file to set the $APPTAINER_BINDPATH variable appropriately in your environment. This variable will propagate to your jobs so that your Apptainer container will have access to files on the host system.
[user@biowulf ~]$ . /usr/local/current/apptainer/app_conf/sing_binds [user@biowulf ~]$
Create a swarmfile (e.g. apptainer.swarm). For example:
apptainer exec docker://python python /data/${USER}/script1.py apptainer exec docker://python python /data/${USER}/script2.py apptainer exec docker://python python /data/${USER}/script3.py apptainer exec docker://python python /data/${USER}/script4.py
Submit this job using the swarm command.
[user@biowulf ~]$ swarm -f apptainer.swarm [-g #] [-t #] --module apptainerwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module apptainer | Loads the apptainer module for each subjob in the swarm |
A few containers have caused issues on Biowulf by triggering a kernel level bug described in detail here and here. These include fmriprep and nanodisco. The problems follow a predictable pattern:
[user@cn1234 ~]$ apptainer build --sandbox container_name container_name.sifPlease contact staff@hpc.nih.gov with questions.
To use Apptainer on Biowulf, you either need to use a pre-built container created by someone else, or build your own container. Building a container from a definition file requires elevated privileges, so containers can't be built on the NIH HPC systems. You have several options to build Apptainer containers:
You can find information about installing Apptainer on Linux here.
In addition to your own Linux environment, you will also need a definition file to build an Apptainer container from scratch. You can find some simple definition files for a variety of Linux distributions in the /example directory of the source code. You can also find links to definition files containing popular applications in the Documentation section above. Detailed documentation about building Apptainer container images is available at the Apptainer website.
One can use Apptainer to "install" software and use it transparently as though it were installed directly on the host system. In fact, NIH HPC staff members use Apptainer to install a large number of scientific applications. This method can make it easier to install software and renders the final product portable. For example, if you wanted to use the default Debian package manager (APT) to "install" software on Biowulf you could do something like this. Here we install samtools and bcftools, with the following definition file:
Bootstrap: docker From: debian:9-slim %post # install the desired software apt-get update apt-get install -y samtools bcftools apt-get cleanThis defines a container based on the space-efficient "slim" Debian images from Docker Hub and installs the samtools and bcftools packages via APT.
After finalizing the definition file, you can proceed to build the container (of course, on a system where you have sudo or root access):
[user@some_build_host ~]$ sudo apptainer build hts.sif hts.def
You can then set up your installation prefix (here, it's $HOME/opt/hts) as follows, making use of symbolic links and a wrapper script:
$HOME/opt └── hts ├── bin │ ├── samtools -> ../libexec/wrap │ └── bcftools -> ../libexec/wrap └── libexec ├── wrap └── hts.simgwhere the wrapper script wrap looks like:
#!/bin/bash . /usr/local/current/apptainer/app_conf/sing_binds selfdir="$(dirname $(readlink -f ${BASH_SOURCE[0]}))" cmd="$(basename $0)" apptainer exec "${selfdir}/hts.simg" "$cmd" "$@"wrap checks to see how it was called, then passes that same command to the container after appropriately setting APPTAINER_BINDPATH by calling the staff maintained sing_binds script.
So if you have added the installation prefix $HOME/opt/hts/bin to your PATH, then calling samtools or bcftools will run those programs from within your container. And because we have arranged to bind mount all the necessary filesystems into the container, the path names you provide for input and output into the programs will be available to the container in the same way.
In this example, we will create an Apptainer container image starting from the official continuumio miniconda container on Docker Hub. Then we'll install a number of RNASeq tools. This would allow us to write a pipeline with, for example, Snakemake and distribute it along with the image to create an easily shared, reproducible workflow. This definition file also installs a runscript enabling us to treat our container like an executable.
BootStrap: docker From: continuumio/miniconda:latest IncludeCmd: yes %post # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # this will install all necessary packages and prepare the container apt-get -y update --allow-releaseinfo-change apt-get -y install make gcc zlib1g-dev libncurses5-dev wget https://github.com/samtools/samtools/releases/download/1.3.1/samtools-1.3.1.tar.bz2 \ && tar -xjf samtools-1.3.1.tar.bz2 \ && cd samtools-1.3.1 \ && make \ && make prefix=/usr/local install export PATH=/opt/conda/bin:$PATH conda install --yes -c bioconda \ star \ sailfish \ fastqc \ kallisto \ subread conda clean --index-cache --tarballs --packages --yes mkdir /data /resources %runscript #!/bin/bash # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # this text code will run whenever the container # is called as an executable or with `apptainer run` function usage() { cat <<EOF NAME rnaseq - rnaseq pipeline tools 0.1 SYNOPSIS rnaseq tool [tool options] rnaseq list rnaseq help DESCRIPTION Apptainer container with tools to build rnaseq pipeline. EOF } function tools() { echo "conda: $(which conda)" echo "---------------------------------------------------------------" conda list echo "---------------------------------------------------------------" echo "samtools: $(samtools --version | head -n1)" } arg="${1:-none}" case "$arg" in none) usage; exit 1;; help) usage; exit 0;; list) tools; exit 0;; # just try to execute it then *) $@;; esac %environment # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # This sets global environment variables for anything run within the container export PATH="/opt/conda/bin:/usr/local/bin:/usr/bin:/bin:" unset CONDA_DEFAULT_ENV export ANACONDA_HOME=/opt/conda
Assuming this file is called rnaseq.def, we can create an Apptainer container called rnaseq on our build system with the following commands:
[user@some_build_system ~]$ sudo apptainer build rnaseq rnaseq.def
This image contains miniconda and our rnaseq tools and can be called directly as an executable like so:
[user@some_build_system ~]$ ./rnaseq help NAME rnaseq - rnaseq pipeline tools 0.1 SYNOPSIS rnaseq snakemake [snakemake options] rnaseq list rnaseq help DESCRIPTION Apptainer container with tools to build rnaseq pipeline. [user@some_build_system ~]$ ./rnaseq list conda: /opt/conda/bin/conda --------------------------------------------------------------- # packages in environment at /opt/conda: # fastqc 0.11.5 1 bioconda java-jdk 8.0.92 1 bioconda kallisto 0.43.0 1 bioconda sailfish 0.10.1 boost1.60_1 bioconda [...snip...] [user@some_build_system ~]$ ./rnaseq samtools --version samtools 1.3.1 Using htslib 1.3.1 Copyright (C) 2016 Genome Research Ltd.
After copying the image to the NIH HPC systems, allocate an sinteractive session and test it there
[user@cn1234 ~]$ module load apptainer [user@cn1234 ~]$ ./rnaseq list conda: /opt/conda/bin/conda --------------------------------------------------------------- # packages in environment at /opt/conda: # fastqc 0.11.5 1 bioconda java-jdk 8.0.92 1 bioconda kallisto 0.43.0 1 bioconda sailfish 0.10.1 boost1.60_1 bioconda [...snip...]
This could be used with a Snakemake file like this
rule fastqc: input: "{sample}.fq.gz" output: "{sample}.fastqc.html" shell: """ module load apptainer ./rnaseq fastqc ... {input} """ rule align: input: "{sample}.fq.gz" output: "{sample}.bam" shell: """ module load apptainer ./rnaseq STAR .... """