samblaster is a program for marking duplicates and finding discordant/split read pairs in read-id grouped paired-end SAM files. When marking duplicates, samblaster will use about 20MB per 1M read pairs. In a read-id grouped SAM file all alignments for a read-id (QNAME) are continuous. Aligners naturally produce such files. They can also be created by sorting a SAM file by read-id.
Run samblaster on a bam file sorted by read name with duplicates already
marked. Save discordant pairs to disc.sam
and split reads
to split.sam
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load samblaster [+] Loading samblaster 0.1.22 [user@cn3144 ~]$ samtools view -h /usr/local/apps/samblaster/TEST_DATA/test.bam \ | samblaster --ignoreUnmated -a -e -d disc.sam -s split.sam -o /dev/null [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. samblaster.sh). For example:
#!/bin/bash module load samtools samblaster || exit 1 samtools view -h /path/to/input.bam \ | samblaster -e -d disc.sam -s split.sam -o /dev/null
Submit this job using the Slurm sbatch command.
sbatch samblaster.sh
Create a swarmfile (e.g. samblaster.swarm). For example:
samtools view -h /path/to/input1.bam \ | samblaster -e -d disc1.sam -s split1.sam -o /dev/null samtools view -h /path/to/input2.bam \ | samblaster -e -d disc2.sam -s split2.sam -o /dev/null samtools view -h /path/to/input3.bam \ | samblaster -e -d disc3.sam -s split3.sam -o /dev/null
Submit this job using the swarm command.
swarm -f samblaster.swarm --module samblasterwhere
--module samblaster | Loads the samblaster module for each subjob in the swarm |