High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Sambamba on Biowulf & Helix

Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Current parallelised functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth.

Running on Helix

$ module load sambamba
$ sambamba view OPTIONS <input.bam | input.sam> [region1 [...]]
 

Some binaries can be run multi-threaded using -t flag. Do not use more than 4 threads on helix.

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash

module load sambamba
cd /data/$USER/dir
sambamba view OPTIONS <input.bam | input.sam> [region1 [...]] -t $SLURM_CPUS_PER_TASK

2. Submit the script on biowulf. The value assigned to '--cpus-per-task' will be passed to '$SLURM_CPUS_PER_TASK' in the script:

$ sbatch --cpus-per-task=4 jobscript

For more memory requirement (default 2xcpus=8gb in this case), use --mem flag:

$ sbatch --cpus-per-task=4 --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; sambamba view OPTIONS <input.bam | input.sam> [region1 [...]] -t $SLURM_CPUS_PER_TASK
  cd /data/$USER/dir2; sambamba view OPTIONS <input.bam | input.sam> [region1 [...]] -t $SLURM_CPUS_PER_TASK
  cd /data/$USER/dir3; sambamba view OPTIONS <input.bam | input.sam> [region1 [...]] -t $SLURM_CPUS_PER_TASK
	[......]

Submit the swarm file:

  $ swarm -f swarmfile -t 4 --module sambamba

-f: specify the swarmfile name
-t: specify the thread number
--module: set environmental variables for each command line in the file

To allocate more memory, use -g flag:

  $ swarm -f swarmfile -t 4 -g 10 --module sambamba

-g: allocate more memory

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive --cpus-per-task=4
salloc.exe: Granted job allocation 16535

cn999$ module load sambamba
cn999$ cd /data/$USER/dir
cn999$ sambamba view OPTIONS &lt;input.bam | input.sam&gt; [region1 [...]] -t $SLURM_CPUS_PER_TASK

cn999$ exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem flag. For example

biowulf$ sinteractive --cpus-per-task=4 --mem=10g

Documentation

https://lomereiter.github.io/sambamba/docs/sambamba-view.html