High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
BamTools on Helix & Biowulf

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files. It was developed by Derek Barnett in the Marth lab at Boston College.

 

Running on Helix

$ module load bamtools
$ bamtools convert file.bam file.fastq

Submitting a single batch job

1. Create a script file along the following lines:

#!/bin/bash 

cd /data/$USER/mydir
module load bamtools
bamtools convert file.bam file.fastq
bamtools sort file.bam

2. Submit to batch system with:

[user@biowulf]$ sbatch jobscript

If more momory is required (default 4gb), specify --mem=Mg, for example --mem=10g:

$ sbatch --mem=10g jobscript

 

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/$USER/cmdfile).

cd /data/$USER/dir1; bamtools convert file1.bam file1.fastq
cd /data/$USER/dir2; bamtools convert file2.bam file2.fastq
cd /data/$USER/dir3; bamtools convert file3.bam file3.fastq
[...]

Submit this job with:

 $ swarm -f cmdfile --module bamtools

If each command requires more memory, submit with --mem=Mg, for example, the following will allocation 10gb for each command:

$ swarm -g 10 -f cmdfile --module bamtools

For more information regarding running swarm, see swarm.html

 

Running an interactive job

User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf] $ sinteractive --mem=10g
salloc.exe: Granted job allocation 1528
[user@pXXX]$ module load bamtools
[user@pXXX]$ cd /data/$USER/bamtools/run1
[user@pXXX]$ bamtools convert file1.bam file1.fastq
[user@pXXX]$ bamtools merge -in file1.bam file2.bam file3.bam -out merged.bam
[user@pXXX] exit
[user@biowulf]$

Documentation

After loading the bamtools module, typing 'bamtools' with no parameters will print a simple help page.

[user@biowulf]$ module load bamtools
[user@biowulf]$ bamtools

usage: bamtools [--help] COMMAND [ARGS]

Available bamtools commands:
        convert         Converts between BAM and a number of other formats
        count           Prints number of alignments in BAM file(s)
        coverage        Prints coverage statistics from the input BAM file
        filter          Filters BAM file(s) by user-specified criteria
        header          Prints BAM header information
        index           Generates index for BAM file
        merge           Merge multiple BAM files into single file
        random          Select random alignments from existing BAM file(s), intended more as a testing tool.
        resolve         Resolves paired-end reads (marking the IsProperPair flag as needed)
        revert          Removes duplicate marks and restores original base qualities
        sort            Sorts the BAM file according to some criteria
        split           Splits a BAM file on user-specified property, creating a new BAM output file for each value found
        stats           Prints some basic statistics from input BAM file(s)

See 'bamtools help COMMAND' for more information on a specific command.

PDF documentation

Bamtools Wiki