High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
bamsurgeon on biowulf and helix
Tools for adding mutations to existing .bam files; used for testing mutation callers

Bamsurgeon Environment Module

Before running a bamsurgeon program, you must add the bamsurgeon environment module and the other modules it uses into your shell environment. This is most easily done by using the module commands, as in the example below:

[user@biowulf]$ module avail bamsurgeon                   (see what versions are available)

-------------------- /usr/local/Modules/3.2.9/modulefiles --------------------
[user@biowulf]$ module load bamsurgeon                           (load the default version and required packages)
[user@biowulf]$ module list                                      (see what versions are loaded)
Currently Loaded Modules:
  1) bamsurgeon/2016.04.14
[user@biowulf]$ module unload bamsurgeon                         (unload bamsurgeon and required packages)

If you have been using bamsurgeon/2015.03.13, you may notice a change in the output of "module load" and "module unload" for bamsurgeon/2016.04.14. Whereas the dependencies — bwa, exonerate, picard, python, samtools, and velvet — were announced as loaded or unloaded for bamsurgeon/2015.03.30, this is not the case for bamsurgeon/2016.04.14. This change for 2016.04.14 is due to the fact that the dependency programs ARE NOT loaded (or unloaded), but rather are embedded in the module bamsurgeon/2016.04.14 to prevent an unintended version change of any dependency program.

Environment Variables for Module bamsurgeon

[user@biowulf]$ module whatis bamsurgeon
bamsurgeon/2016.04.14                       : Sets up bamsurgeon/2016.04.14 using 
bamsurgeon/2016.04.14                       :   bwa/0.7.13 exonerate/2.4.0 picard/1.131 
bamsurgeon/2016.04.14                       :   python/2.7.9 samtools/1.3.1 velvet/1.2.10 
bamsurgeon/2016.04.14                       :   with environment variables 
bamsurgeon/2016.04.14                       :     BAMSURGEON_EXECS    -- addindel.py addsvn.py addsv.py 
bamsurgeon/2016.04.14                       :     BAMSURGEON_TEST     -- Path to bamsurgeon test programs 
bamsurgeon/2016.04.14                       :     BAMSURGEON_DATA     -- Path to bamsurgeon test data 
bamsurgeon/2016.04.14                       :     GENOME_REFERENCE    -- Path to genome ref for use with 'bamsurgeon' 
[user@biowulf]$ echo $BAMSURGEON_EXECS
addindel.py addsvn.py addsv.py
[user@biowulf]$ echo $GENOME_REFERENCE

Executables for Module bamsurgeon

Bamsurgeon test scripts

Writing a bamsurgeon program

The bamsurgeon module provides three python programs — addsnv.py, addsv.py, and addindel.py — each of which adds a class of SNVs to reads and outputs these modified reads, along with mates, as a .bam file.

When the bamsurgeon module environment is in force, the value of the environment variable, "GENOME_REFERENCE", is the absolute directory path of a "genome reference" fasta file which has been specially built (at the Broad Institute) for use with programs in the bamsurgeon module. You should use this "genome reference" file for all bamsurgeon operations applied to the human genome.

$ module load bamsurgeon
$ ls -l /fdb/bamsurgeon/Homo_sapiens/GenomeReference/
total 8401372
-rw-r--r-- 1 sandor staff 3140756381 Jun 29  2010 Homo_sapiens_assembly19.fasta
-rw-r--r-- 1 sandor staff       6597 Feb  5  2014 Homo_sapiens_assembly19.fasta.amb
-rw-r--r-- 1 sandor staff       6901 Feb  5  2014 Homo_sapiens_assembly19.fasta.ann
-rw-r--r-- 1 sandor staff 3101976644 Feb  5  2014 Homo_sapiens_assembly19.fasta.bwt
-rw-r--r-- 1 sandor staff       2780 Feb  5  2014 Homo_sapiens_assembly19.fasta.fai
-rw-r--r-- 1 sandor staff  775494142 Feb  5  2014 Homo_sapiens_assembly19.fasta.pac
-rw-r--r-- 1 sandor staff 1550988336 Feb  5  2014 Homo_sapiens_assembly19.fasta.sa

Running a bamsurgeon program

A "bamsurgeon program" is any program that invokes one or more of the python programs in the bamsurgeon module.

(See the Documentation section for usage and examples of these python programs)

Typically, a "bamsurgeon program" is written in a scripting language, e.g. /bin/bash, /usr/bin/perl. But no matter how your bamsurgeon program is organized, it is important to remember that before your program invokes any one of the bamsurgeon python programs:

  1. The job must be running on a biowulf cluster node (NOT the biowulf login node)
  2. The job's environment must contain the results of the command "module load bamsurgeon"

Running a single Bamsurgeon batch job on Biowulf

(See the section of the same name for application samtools).

Running a swarm of Bamsurgeon jobs

(See the section of the same name for application samtools).

For more information regarding running swarm, see swarm.html

Running an interactive Bamsurgeon job on Biowulf

(See the section of the same name for application samtools).