High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
bowtie2 on Biowulf2 & Helix

Description

Bowtie2 is a fast, multi-threaded, and memory efficient aligner for short read sequences. It uses an FM index to achieve a moderate memory footprint of 2 - 4 GB, depending on genome size and alignment parameters. Performance scales well with thread count.

Note that this page only describes bowtie2. Bowtie1 is described on a separate page. Unlike bowtie1, bowtie2 supports local alignments and gapped alignments, amongst other enhancements and new features. It is also more suited for longer reads and calculates a more informative MAPQ than bowtie1.

References

Web sites

Index files

Bowtie2 indices are available as part of the igenomes package under

/fdb/igenomes/[organism]/[source]/[build]/Sequence/Bowtie2Index/*

More information on the locally available igenomes builds/organisms is available from our scientific database index. For more information about igenomes in general, iGenomes readme.

Performance considerations

Bowtie2 is a multithreaded application. To determine how well its performace scales with the number of threads, the same input file (the first 25M ChIP-Seq reads from mouse heart tissue, ENCSR699XHY, replicate 2) was aligned with different numbers of threads. Note that some versions of bowtie2 ,in addition to the alignment threads, also starts some supporting processes such as a process to expand compressed input. These processes can lead to reduced efficiency due to frequent context switches (i.e. job overloading). To measure the effect of this, the same experiment was carried out in two differnet conditions: (1) In one case, bowtie2 was allowed to run as many threads as there were allocated CPUs (--threads=$SLURM_CPUS_PER_TASK), which lead to a mild overload. (2) In the other case, two extra CPUs were allocated (i.e. threads=$(( SLURM_CPUS_PER_TASK - 2 ))) to account for the extra processes.

Local alignments were approximately 20% slower than end-to-end alignments. Both were done in sensitive mode for several versions of bowtie2.

From this we can see that allocating 2 CPUs more than there are bowtie2 threads can have a modest performance benefit. Also, bowtie2 scales linearly with the number of threads up to 32. However, the slope is less than 1 and therefore it is inefficient to run with more than 8-16 threads.

Running bowtie2 on Helix

Bowtie2 is a multi-threaded application. The thread count is determined with -p / --threads and defaults to 1. Please do not run bowtie using more than 2 threads on the interactive helix system. Note that bowtie and bowtie2 are considered a single application in the environmental modules system and that therefore the default module for bowtie is the most recent version of bowtie 2.

The following example aligns single ended data, creating bam output directly and uncompressing gziped fastq on the fly. In this case, the alignment mode is set to sensitive-local. ALignments are filtered to remove any with MAPQ < 30.

helix$ module load bowtie/2-2.2.5
helix$ module load samtools/1.2
helix$ cd /path/to/bamdir
helix$ export BOWTIE2_INDEXES=/fdb/igenomes/Mus_musculus/UCSC/mm9/Sequence/Bowtie2Index/
helix$ bowtie2 --sensitive-local -p2 --no-unal -x genome \
      -U /usr/local/apps/bowtie/TEST_DATA/ENCFF001KPB.fastq.gz \
   | samtools view -q30 -Sb - > bam/ENCFF001KPB.bam
9157799 reads; of these:
  9157799 (100.00%) were unpaired; of these:
    706351 (7.71%) aligned 0 times
    6112559 (66.75%) aligned exactly 1 time
    2338889 (25.54%) aligned >1 times
92.29% overall alignment rate

Note that a version for bowtie2 is selected explicitly here. Substitute any fixed bowtie2 version here or use the default for the most current installed bowtie2.

Running a single bowtie2 batch job on Biowulf2

Batch scripts should use the $SLURM_CPUS_PER_TASK environment variable to determine the number of threads for bowtie2. This variable is set by slurm according to the number of requested cores. As explained above, please allocate two bowtie2 threads fewer than $SLURM_CPUS_PER_TASK to avoid overloading the job since bowtie2 starts additional processes. An example script might look like the following:

#! /bin/bash
set -o pipefail
set -e

echo "Running on $SLURM_CPUS_PER_TASK CPUs"
threads=$(( SLURM_CPUS_PER_TASK - 2 ))
module load bowtie/2-2.2.5 || exit 1
module load samtools/1.2 || exit 1

cd /data/$USER/test_data
export BOWTIE2_INDEXES=/fdb/igenomes/Mus_musculus/UCSC/mm9/Sequence/Bowtie2Index
bowtie2 --sensitive-local -p ${threads} --no-unal -x genome \
      -U /usr/local/apps/bowtie/TEST_DATA/ENCFF001KPB.fastq.gz \
   | samtools view -q30 -Sb - > ENCFF001KPB.bam

The script is submitted to the batch system requesting, for example, 16 CPUs and 5GB of memory:

bowtie2$ sbatch --cpus-per-task=16 --mem=5g example_bowtie2_slurm.sh

Running a swarm of bowtie2 batch jobs on Biowulf2

First a command file is created with one task per line (line continuations are allowed). The following example contains two tasks:

cd /data/$USER/test_data \
   && export BOWTIE2_INDEXES=/fdb/igenomes/Mus_musculus/UCSC/mm9/Sequence/Bowtie2Index \
   && bowtie2 --sensitive-local -p $(( SLURM_CPUS_PER_TASK - 2 )) --no-unal -x genome \
         -U /usr/local/apps/bowtie/TEST_DATA/ENCFF001KPB.fastq.gz \
       | samtools view -q30 -Sb - > ENCFF001KPB.bam
cd /data/$USER/test_data \
   && export BOWTIE2_INDEXES=/fdb/igenomes/Mus_musculus/UCSC/mm9/Sequence/Bowtie2Index \
   && bowtie2 --sensitive-local -p $(( SLURM_CPUS_PER_TASK - 2 )) --no-unal -x genome \
         -U /usr/local/apps/bowtie/TEST_DATA/ENCFF322WUF.fastq.gz \
       | samtools view -q30 -Sb - > ENCFF322WUF.bam

The two tasks in the command file (example_bowtie2_swarm) can then be executed using swarm requesting 8 CPUs and 5GB of memory for each task. Modules required for the tasks can be provided on the command line as well.

biowulf2$ swarm -f example_bowtie2_swarm -t 8 -g 5 --module bowtie/2-2.2.5,samtools/1.2
2 commands run in 2 tasks, each requiring 5 gb, 8 threads, and 0 gb diskspace
biowulf2$ jobload -u $USER
     JOBID      RUNTIME     NODES   CPUS    AVG CPU%            MEMORY
                                                              Used/Alloc
   nnnnn_1     00:00:09      pnnn      8       13.97       2.0 GB/5.0 GB
   nnnnn_0     00:00:09      pnnn      8       12.50       2.0 GB/5.0 GB

Running an interactive job on Biowulf2

Interactive work requiring resources should be carried out on interactive compute nodes, not on the head node or helix. Interactive nodes are allocated with sinteractive. For example, to request a 8 core interactive node with 8GB memory:

$ sinteractive -c 8 --mem=5g

On the interactive node, bowtie2 is then used essentially as decribed above.

Documentation

Bowtie2 has highly configurable alignment parameters along with pre-set modes like --sensitive-local. For more detailed information see the Home page and the Manual

No index, query, or output file specified!
Bowtie 2 version 2.2.5 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: 
  bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]

  <bt2-idx>  Index filename prefix (minus trailing .X.bt2).
             NOTE: Bowtie 1 and Bowtie 2 indexes are not compatible.
  <m1>       Files with #1 mates, paired with files in <m2>.
             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
  <m2>       Files with #2 mates, paired with files in <m1>.
             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
  <r>        Files with unpaired reads.
             Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
  <sam>      File for SAM output (default: stdout)

  <m1>, <m2>, <r> can be comma-separated lists (no whitespace) and can be
  specified many times.  E.g. '-U file1.fq,file2.fq -U file3.fq'.

Options (defaults in parentheses):

 Input:
  -q                 query input files are FASTQ .fq/.fastq (default)
  --qseq             query input files are in Illumina's qseq format
  -f                 query input files are (multi-)FASTA .fa/.mfa
  -r                 query input files are raw one-sequence-per-line
  -c                 <m1>, <m2>, <r> are sequences themselves, not files
  -s/--skip <int>    skip the first <int> reads/pairs in the input (none)
  -u/--upto <int>    stop after first <int> reads/pairs (no limit)
  -5/--trim5 <int>   trim <int> bases from 5'/left end of reads (0)
  -3/--trim3 <int>   trim <int> bases from 3'/right end of reads (0)
  --phred33          qualities are Phred+33 (default)
  --phred64          qualities are Phred+64
  --int-quals        qualities encoded as space-delimited integers

 Presets:                 Same as:
  For --end-to-end:
   --very-fast            -D 5 -R 1 -N 0 -L 22 -i S,0,2.50
   --fast                 -D 10 -R 2 -N 0 -L 22 -i S,0,2.50
   --sensitive            -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default)
   --very-sensitive       -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

  For --local:
   --very-fast-local      -D 5 -R 1 -N 0 -L 25 -i S,1,2.00
   --fast-local           -D 10 -R 2 -N 0 -L 22 -i S,1,1.75
   --sensitive-local      -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default)
   --very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

 Alignment:
  -N <int>           max # mismatches in seed alignment; can be 0 or 1 (0)
  -L <int>           length of seed substrings; must be >3, <32 (22)
  -i <func>          interval between seed substrings w/r/t read len (S,1,1.15)
  --n-ceil <func>    func for max # non-A/C/G/Ts permitted in aln (L,0,0.15)
  --dpad <int>       include <int> extra ref chars on sides of DP table (15)
  --gbar <int>       disallow gaps within <int> nucs of read extremes (4)
  --ignore-quals     treat all quality values as 30 on Phred scale (off)
  --nofw             do not align forward (original) version of read (off)
  --norc             do not align reverse-complement version of read (off)
  --no-1mm-upfront   do not allow 1 mismatch alignments before attempting to
                     scan for the optimal seeded alignments
  --end-to-end       entire read must align; no clipping (on)
   OR
  --local            local alignment; ends might be soft clipped (off)

 Scoring:
  --ma <int>         match bonus (0 for --end-to-end, 2 for --local) 
  --mp <int>         max penalty for mismatch; lower qual = lower penalty (6)
  --np <int>         penalty for non-A/C/G/Ts in read/ref (1)
  --rdg <int>,<int>  read gap open, extend penalties (5,3)
  --rfg <int>,<int>  reference gap open, extend penalties (5,3)
  --score-min <func> min acceptable alignment score w/r/t read length
                     (G,20,8 for local, L,-0.6,-0.6 for end-to-end)

 Reporting:
  (default)          look for multiple alignments, report best, with MAPQ
   OR
  -k <int>           report up to <int> alns per read; MAPQ not meaningful
   OR
  -a/--all           report all alignments; very slow, MAPQ not meaningful

 Effort:
  -D <int>           give up extending after <int> failed extends in a row (15)
  -R <int>           for reads w/ repetitive seeds, try <int> sets of seeds (2)

 Paired-end:
  -I/--minins <int>  minimum fragment length (0)
  -X/--maxins <int>  maximum fragment length (500)
  --fr/--rf/--ff     -1, -2 mates align fw/rev, rev/fw, fw/fw (--fr)
  --no-mixed         suppress unpaired alignments for paired reads
  --no-discordant    suppress discordant alignments for paired reads
  --no-dovetail      not concordant when mates extend past each other
  --no-contain       not concordant when one mate alignment contains other
  --no-overlap       not concordant when mates overlap at all

 Output:
  -t/--time          print wall-clock time taken by search phases
  --un <path>           write unpaired reads that didn't align to <path>
  --al <path>           write unpaired reads that aligned at least once to <path>
  --un-conc <path>      write pairs that didn't align concordantly to <path>
  --al-conc <path>      write pairs that aligned concordantly at least once to <path>
  (Note: for --un, --al, --un-conc, or --al-conc, add '-gz' to the option name, e.g.
  --un-gz <path>, to gzip compress output, or add '-bz2' to bzip2 compress output.)
  --quiet            print nothing to stderr except serious errors
  --met-file <path>  send metrics to file at <path> (off)
  --met-stderr       send metrics to stderr (off)
  --met <int>        report internal counters & metrics every <int> secs (1)
  --no-head          supppress header lines, i.e. lines starting with @
  --no-sq            supppress @SQ header lines
  --rg-id <text>     set read group id, reflected in @RG line and RG:Z: opt field
  --rg <text>        add <text> ("lab:value") to @RG line of SAM header.
                     Note: @RG line only printed when --rg-id is set.
  --omit-sec-seq     put '*' in SEQ and QUAL fields for secondary alignments.

 Performance:
  -p/--threads <int> number of alignment threads to launch (1)
  --reorder          force SAM output order to match order of input reads
  --mm               use memory-mapped I/O for index; many 'bowtie's can share

 Other:
  --qc-filter        filter out reads that are bad according to QSEQ filter
  --seed <int>       seed for random number generator (0)
  --non-deterministic seed rand. gen. arbitrarily instead of using read attributes
  --version          print version information and quit
  -h/--help          print this usage message