samblaster is a program for marking duplicates and finding discordant/split read pairs in read-id grouped paired-end SAM files. When marking duplicates, samblaster will use about 20MB per 1M read pairs. In a read-id grouped SAM file all alignments for a read-id (QNAME) are continuous. Aligners naturally produce such files. They can also be created by sorting a SAM file by read-id.


Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@cn3144 ~]$ module load samblaster
[+] Loading samblaster 0.1.22

[user@cn3144 ~]$ samtools view -h /usr/local/apps/samblaster/TEST_DATA/test.bam \
  | samblaster --ignoreUnmated -a -e -d disc.sam -s split.sam -o /dev/null

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:


module load samtools samblaster || exit 1
samtools view -h /path/to/input.bam \
  | samblaster -e -d disc.sam -s split.sam -o /dev/null

Submit this job using the Slurm sbatch command.

Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. samblaster.swarm). For example:

samtools view -h /path/to/input1.bam \
  | samblaster -e -d disc1.sam -s split1.sam -o /dev/null
samtools view -h /path/to/input2.bam \
  | samblaster -e -d disc2.sam -s split2.sam -o /dev/null
samtools view -h /path/to/input3.bam \
  | samblaster -e -d disc3.sam -s split3.sam -o /dev/null

Submit this job using the swarm command.

swarm -f samblaster.swarm --module samblaster
--module samblaster Loads the samblaster module for each subjob in the swarm