Biowulf High Performance Computing at the NIH
Crumble on Biowulf

Crumble is a tool that compresses SAM/BAM/CRAM files with controlled loss of quality values. Crumble can read a SAM/BAM/CRAM file, compute which confidence values to keep and which to omit, and emit a new file with most qualities removed.


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn4224 are ready for job

[user@cn4224 ~]$ cd /data/$USER

[user@cn4224 ~]$ module load crumble
[+] Loading crumble  0.8.3  on cn4224

[user@cn4224 ~]$ crumble -v -O sam test.bam test.sam
--- Crumble v0.8.3: parameters ---
reduce qual:   yes
indel STR mul: 1.00
indel STR add: 2
SNP   STR mul: 0.00
SNP   STR add: 0
Qual low  5, used for discrepant bases in high conf call
Qual high 40, used for matching bases in high conf call
Keep if mqual <= 0
Calls without mqual: disabled.
Calls with mqual, keep qual if:
  SNP < 70,  indel < 125,  discrep > 1.50
Low mqual perc   = 1.000000
Ins length perc  = 1.000000
indel ov perc    = 0.000000
overdepth factor = 999.000000
P-block level    = 8
Processing 1:111881

: Counts of positions preserved by option
A/B Diff         = 0
A/B Indel        = 4686 / 6221
A:  Het          = 0 / 0
A:  Hom          = 0 / 0
A:  Discrep      = 0
B:  Het          = 33 / 47
B:  Hom          = 112665 / 188104
B:  Discrep      = 210

Columns          = 188151
Low_mqual_perc   = 0
Clip_perc        = 50
Ins_len_perc     = 0
indel_ov_perc    = 0
count_over_depth = 0

[user@cn4224 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226

[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. similar to the following.

#! /bin/bash

set -e

module load crumble

cd /data/$USER

crumble -v -O sam test.bam test.sam

Submit these jobs using the Slurm sbatch command.

Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile for the first step of the pipeline (e.g. crumble.swarm). For example:

crumble -v -O sam file01.bam file01.sam
crumble -v -O sam file02.bam file02.sam
crumble -v -O sam file03.bam file03.sam

Submit this job using the swarm command.

swarm -f crumble.swarm [-g #] --module crumble
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
--module crumble Loads the crumble module for each subjob in the swarm