Conda on Biowulf

Conda is a cross-platform and language-independent user-level package manager. It is well-established in the bioinformatics community by virtue of the bioconda package repository.

See the personal software installation user guide for more about package managers and alternative approaches.

References:

Documentation
Important Notes

Common pitfalls

Status 403 error for conda defaults channels
If you are getting an error similar to
RuntimeError: Multi-download failed. Reason: Transfer finalized, status: 403
    [https://repo.anaconda.com/pkgs/r/noarch/repodata.json] 4020 bytes
	
then you can fix it for most cases by disabling the default channels and adding alternative channels. This can be done with by adding the following lines to a .condarc file at the top level of the miniconda install (i.e. $CONDA_ROOT/.condarc). You might also want to set strict channel priority at the same time if you haven't already done so:
channels:
  - conda-forge
  - bioconda
defaults: []
channel_priority: strict
	
You should also ensure that your ~/.condarc does not explicitly set channels or defaults.
Conda install filled up /home
We do not allow requests to increase the quota on home directories. A conda install with some moderately sized environments can fill up that space quickly. We therefore advise against installing conda in your home directory. /data is a better place.
Conda activate added to shell init
Adding conda initialization to your .bashrc/.cshrc/.zshrc can cause problems with NoMachine/VNC (and other issues). We recommend not allowing conda to initialize during shell startup.
Conda environments and NoMachine problems
Related to the above, if you add the executables of the conda dbus package to your PATH in one of your startup files (e.g. ~/.bashrc), NoMachine may fail with a black screen. This can happen when you automatically activate your own conda installation therein. The solution is to remove any unnecessary initialization from your shell startup file manually, comment out or remove the lines between
# >>> conda initialize >>>
and
# <<< conda initialize <<<
Or reverse the behavior: conda init --reverse, or remove the dbus package from the environment.
Swarm and Conda environments
If activating one of your own environments within a swarm, some versions of conda may encounter a race condition that will lead to some swarm subjobs failing to activate the environment correctly. This can be avoided by either activating the conda environment before submitting the swarm (the environment gets exported to the job), or by calling your program with the full path to your environment without activating the environment. Update: This should not be happening any longer but leaving note here in case there is a regression.
mamba_install wrapper

The mamba_install wrapper is a module to install and configure mamba/conda environments for users (it is still under development. If you see any bugs, please contact staff@hpc.nih.gov). It has several functions:

  1. Installs conda at /data/$USER/conda (by default), or given directory. If the directory already exists, this step is skipped.
  2. Removes the "conda init" code from shell startup file by calling conda init --reverse.
  3. Creates an init_file with the code for activate the conda env for your default shell. To activate source the init_file in your shell session. If the init_file already exists, this step is skipped.

Some examples:
Install fresh conda env in /data/$USER/conda and generate conda init_file as ~/bin/myconda

    mamba_install
Install fresh conda env in /data/$USER/mymamba and generate conda init_file as ~/bin/mymamba
    mamba_install /data/$USER/mymamba --init-file=~/bin/mymamba
Remove the "conda init" code from shell startup file when conda env at /data/$USER/conda
    mamba_install --comment-out-only
Add "conda init" to ~/bin/myminiconda for customized env
    mamba_install --init-only --shell=bash --init-file=~/bin/myminiconda /data/$USER/miniconda/

To load the mamba_install module in an interactive session:
[user@biowulf]$ sinteractive --mem=20g --gres=lscratch:20
[user@cn3444]$ module load mamba_install
[user@cn3444]$ mamba_install -h
mamba_install -h
NAME
    mamba_install - Install and configure mamba + conda forge

SYNOPSIS
    mamba_install [OPTIONS] [mamba_directory]

DESCRIPTION
    Installs conda/mamba following best practices for NIH HPC as
    described at
        https://hpc.nih.gov/docs/diy_installation/conda.html

    The default install location is /data/$USER/conda and installs in the home
    directory are not allowed.

    Fails if the install directory exists already.

    By default creates an init_file with the code necessary to activate the
    conda install for your default shell. To activate source the init_file in
    your shell session. If the init_file already exists this step is skipped.
    If the init_file is created in a directory that is included in the path
    then the source command does not need to specify the whole path (e.g. 'source
    myconda' will work for the default location of the init file).

    Use this init file instead of allowing conda/mamba to modify your
    .bashrc/.zshrc/... to avoid automatic activation of a conda install
    which can result various hard to diagnose problems.

    --no-init
        Do not create init_file

    --init-only
        Create init file for an existing mamba forge install.

    --init-file=FILENAME
        Name of the init file. Defaults to '~/bin/myconda'

    --shell=SHELL
        Create init for SHELL instead of the your default shell. Allowed:
        bash, fish, tcsh, zsh, xonsh

    --debug
        Preserve temp dir

EXAMPLES
    Install in /data/$USER/conda
        mamba_install
    Install in /data/$USER/mymamba
        mamba_install /data/$USER/mymamba
    Install in /data/$USER/conda and put the init file in a different path
        mamba_install --init-file=~/bin/mymamba
    
Example Setup

In the following example, we will cover some basics of using conda to create private environments.

Option 1: Loading the wrapper script to install a fresh mamba on your data directory (default), or choose an alternative directory you wanted to install.

[user@biowulf]$ sinteractive --mem=20g --gres=lscratch:20
[user@cn3444]$ module load mamba_install
[user@cn3444]$ mamba_install
...
[user@cn3444]$ source myconda
    

Option 2: Downloading the mambaforge installer and install to a path in a data or shared directory like /data/$USER/conda.

[user@biowulf]$ sinteractive --mem=20g --gres=lscratch:20
...
[user@cn3444]$ cd /data/$USER
[user@cn3444]$ export TMPDIR=/lscratch/$SLURM_JOB_ID
	

Download the mambaforge installer and install to a path in a data or shared directory like /data/$USER/conda.

[user@cn3444]$ wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
--2022-04-01 11:13:41--  https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
[...snip...]
Length: 92971376 (89M) [application/octet-stream]
Saving to: ‘Mambaforge-Linux-x86_64.sh’

100%[===================================================>] 92,971,376   111MB/s   in 0.8s

2022-04-01 11:13:42 (111 MB/s) - ‘Mambaforge-Linux-x86_64.sh’ saved [92971376/92971376]


[user@cn3444]$ bash Mambaforge-Linux-x86_64.sh -p /data/$USER/conda -b
PREFIX=/data/$USER/conda
Unpacking payload ...
Extracting "python-3.9.10-h85951f9_2_cpython.tar.bz2"
Extracting "_libgcc_mutex-0.1-conda_forge.tar.bz2"
[...snip...]
installation finished.
[user@cn3444]$ rm Mambaforge-Linux-x86_64.sh
	

To use the newly installed conda you will have to source an init file. Do this each time you are going to work with your environment.

Do not allow conda/mamba to add automatic initialization to your startup files (e.g. .bashrc) as environments can interfere with login or NoMachine..

After sourcing the conda init file, activate the base environment and update the conda package manager which itself is just a package:

[user@cn3444]$ source /data/$USER/conda/etc/profile.d/conda.sh && source /data/$USER/conda/etc/profile.d/mamba.sh

### to make things easier you can create a file called `myconda` in a directory
### on your path such as ~/bin. This could be done like so (assuming the same
### paths as we used here).

[user@cn3444]$ mkdir -p ~/bin

### this whole multi-line "heredoc" creates an activation script
[user@cn3444]$ cat <<'__EOF__' > ~/bin/myconda
__conda_setup="$('/data/$USER/conda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/data/$USER/conda/etc/profile.d/conda.sh" ]; then
        . "/data/$USER/conda/etc/profile.d/conda.sh"
    else
        export PATH="/data/$USER/conda/bin:$PATH"
    fi
fi
unset __conda_setup

if [ -f "/data/$USER/conda/etc/profile.d/mamba.sh" ]; then
    . "/data/$USER/conda/etc/profile.d/mamba.sh"
fi
__EOF__

### then from *anywhere* the mambaforge install can be activated with
[user@cn3444]$ source myconda
Let's not show the large mamba banner all the time
[user@cn3444]$ export MAMBA_NO_BANNER=1
[user@cn3444]$ mamba activate base
(base) [user@cn3444]$ which python
/data/$USER/conda/bin/python
(base) [user@cn3444]$ mamba update --all
Looking for: ['_libgcc_mutex', 'ca-certificates', 'ld_impl_linux-64', 'libstdcxx-ng', 'libgomp', '_openmp_mutex', 'libgcc-ng', 'yaml-cpp', 'yaml', 'xz', 'reproc', 'openssl', 'ncurses', 'lzo', 'lz4-c', 'libzlib', 'libuuid', 'libnsl', 'libiconv', 'libffi', 'libev', 'keyutils', 'icu', 'c-ares', 'bzip2', 'reproc-cpp', 'libedit', 'readline', 'zstd', 'zlib', 'tk', 'krb5', 'sqlite', 'libxml2', 'libssh2', 'libsolv', 'libnghttp2', 'libarchive', 'libcurl', 'libmamba', 'pybind11-abi', 'tzdata', 'python', 'python_abi', 'setuptools', 'wheel', 'pip', 'six', 'pycparser', 'idna', 'colorama', 'charset-normalizer', 'tqdm', 'ruamel_yaml', 'pysocks', 'pycosat', 'libmambapy', 'certifi', 'cffi', 'conda-package-handling', 'cryptography', 'brotlipy', 'pyopenssl', 'urllib3', 'requests', 'conda', 'mamba']

conda-forge/noarch                                   7.8MB @   3.8MB/s  2.2s
conda-forge/linux-64                                21.9MB @   3.6MB/s  6.6s

Pinned packages:
  - python 3.9.*

[...snip...]
  Change: 4 packages
  Upgrade: 5 packages

  Total download: 47MB

───────────────────────────────────────────────────────────────────────────────

Confirm changes: [Y/n] Y

[...snip...]


Preparing transaction: done
Verifying transaction: done
Executing transaction: done

(base) [user@cn3444]$ mamba clean --all --yes
Cache location: /data/$USER/conda/pkgs
Will remove the following tarballs:

/data/$USER/conda/pkgs
------------------------
python-3.9.5-h12debd9_4.tar.bz2             22.6 MB
[...snip...]
idna-3.2-pyhd3eb1b0_0.conda                   48 KB

---------------------------------------------------
Total:                                      59.4 MB

Removed python-3.9.5-h12debd9_4.tar.bz2
[...snip...]
	

Now let's create a new environment called project1 with an older version of pysam from the bioconda channel and python 3.7. For this we use mamba.

(base) [user@cn3444]$ mamba deactivate
[user@cn3444]$ mamba create -n project1 python=3.7 numpy scipy bioconda::pysam==0.15.3 samtools==1.9
Looking for: ['python=3.7', 'numpy', 'scipy', 'bioconda::pysam==0.15.3', 'samtools==1.9']

bioconda/linux-64                                    4.1MB @   3.8MB/s  1.2s
bioconda/noarch                                      3.5MB @   2.8MB/s  1.3s
conda-forge/noarch                                   7.8MB @   3.9MB/s  2.2s
conda-forge/linux-64                                21.9MB @   3.9MB/s  6.2s
Transaction

  Prefix: /data/$USER/conda/envs/project1

  Updating specs:

   - python=3.7
   - numpy
   - scipy
   - bioconda::pysam==0.15.3
   - samtools==1.9

[...snip...]

Confirm changes: [Y/n] Y
[...snip...]
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ mamba activate project1
#
# To deactivate an active environment, use
#
#     $ mamba deactivate

[user@cn3444]$ mamba activate project1
(project1) [user@cn3444]$ which python
/data/$USER/conda/envs/project1/bin/python
(project1) [user@cn3444]$ samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.
(project1) [user@cn3444]$ mamba deactivate
[user@cn3444]$
	

Now an environment for a different project with current pysam, some other tools, and numpy using the OpenBlas numerical libraries. This time we add the bioconda channel to the channels for the environment so we don't have to use the bioconda:: prefix. A common pattern for environments used for bioinformatic software is to set up bioconda and conda-forge channels on a per-environment basis. This allows conda-forge packages to override packages from the defaults channel. We also specify MKL for the numerical libraries and pin that so it won't change accidentally.

Note that at this point mamba does not yet include the config command.

[user@cn3444 temp]$ mamba create -n project2 python=3.8
[...snip...]
[user@cn3444]$ mamba activate project2
(project2) [user@cn3444]$ conda config --env --add channels bioconda
(project2) [user@cn3444]$ conda config --env --add channels conda-forge
Warning: 'conda-forge' already in 'channels' list, moving to the top
(project2) [user@cn3444]$ conda config --env --set channel_priority strict
(project2) [user@cn3444]$ conda config --env --add pinned_packages blas=*=mkl
(project2) [user@cn3444]$ conda config --show-sources
==> /data/$USER/conda/.condarc <==
channels:
  - conda-forge

==> /data/$USER/envs/project2/.condarc <==
pinned_packages:
  - libblas=*=*_mkl
  - python=3.8
channel_priority: strict
channels:
  - conda-forge
  - bioconda

(project2) [user@cn3444]$ mamba install -q pysam bedtools hisat2 blas numpy scipy
[...snip...]
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done

(project2) [user@cn3444]$ which python
/data/$USER/conda/envs/project2/bin/python
(project2) [user@cn3444]$ mamba install tensorflow=*=cuda*
[...snip...]
	

Note that pip can be used to install packages into conda environments as well. However, this can sometimes cause problems when pip overwrites existing conda-installed packages.

List environments in the current conda install, then deactivate environments

(project2) [user@cn3444]$ mamba info --env
# conda environments:
#
base                     /data/$USER/conda
project1                 /data/$USER/conda/envs/project1
project2              *  /data/$USER/conda/envs/project2
(project2) [user@cn3444]$ mamba deactivate
[user@cn3444]$ 
	

Re-install base conda. In a rare case, there is "start from fresh" solution:

[user@cn3444]$ mv /data/$USER/conda /data/$USER/conda_backup
[user@cn3444]$ ml mamba_install
[user@cn3444]$ mamba_install
.....
[user@cn3444]$ mv /data/$USER/conda_backup/envs /data/$USER/conda
[user@cn3444]$ source myconda
[user@cn3444]$ mamba info --envs