High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
bowtie1 on Biowulf

Bowtie1 is a fast, multi-threaded, and memory efficient aligner for short read sequences. Bowtie uses a Burrows-Wheeler index to achieve a moderate memory footprint of 2 - 4 GB depending on genome size and alignment parameters. Performance generally scales well with thread count.

Note that this page only describes bowtie1. Bowtie2, which supports local alignment, gaps, and longer reads, is documented separately.

References:

Documentation
Important Notes

Bowtie1 indices are available as part of the igenomes package under

/fdb/igenomes/[organism]/[source]/[build]/Sequence/BowtieIndex/*

More information on the locally available igenomes builds/organisms is available from our scientific database index. For more information about igenomes in general, iGenomes readme.

Performance considerations

The amount of time to complete the alignment of approximately 21M ChIP-Seq reads (replicates 1 and 2 of ENCODE experiment ENCSR000CDI, 36nt, H3K27ac ChIP from mouse embryonic fibroblasts) was measured as a function of the number of bowtie threads:

bowtie1 benchmarks

Based on this experiment, increasing the number of threads to more than 12 shows diminishing returns. Therefore the most resource efficient usage of bowtie1 would employ at most 12 threads. If you are gunzipping input files and piping output through samtools, please allocate extra CPUs. Otherwise, the node will be slightly overloaded, which results in a considerable performance penalty due to contention between threads.

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --gres=lscratch:10
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$  module load bowtie/1
[user@cn3144 ~]$  module load samtools/1.6
[user@cn3144 ~]$  cd /lscratch/$SLURM_JOB_ID
[user@cn3144 ~]$  export BOWTIE_INDEXES=/fdb/igenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieIndex/
[user@cn3144 ~]$  ls $BOWTIE_INDEXES
genome.1.ebwt  genome.2.ebwt  genome.3.ebwt  genome.4.ebwt  genome.fa
genome.rev.1.ebwt  genome.rev.2.ebwt
[user@cn3144 ~]$  zcat $BOWTIE_TEST_DATA/ENCFF001KPB.fastq.gz \
   | bowtie --phred64-quals --strata --best --all --chunkmbs 256 -m1 -n2 -p2 --sam genome - \
   | samtools view -F4 -Sb - > ENCFF001KPB.bam
# reads processed: 11623213
# reads with at least one reported alignment: 9467690 (81.46%)
# reads that failed to align: 955092 (8.22%)
# reads with alignments suppressed due to -m: 1200431 (10.33%)
Reported 9467690 alignments

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. bowtie1.sh), which uses the input file 'bowtie1.in'. For example:

#!/bin/bash
module load bowtie/1 samtools || exit 1

wd=$PWD
export BOWTIE_INDEXES=/fdb/igenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieIndex/
cd /lscratch/$SLURM_JOB_ID
zcat $BOWTIE_TEST_DATA/ENCFF001KPB.fastq.gz \
   | bowtie --phred64-quals --strata --best --all --chunkmbs 256 -m1 -n2 -p${SLURM_CPUS_PER_TASK} --sam genome - \
   | samtools view -F4 -Sb - > ENCFF001KPB.bam
mv ENCFF001KPB.bam $wd

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=8 --mem=4g bowtie1.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. bowtie1.swarm). For example:

export BOWTIE_INDEXES=/fdb/igenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieIndex/; \
  zcat sample1.fastq.gz | bowtie --strata --best --all -m1 --sam genome - | samtools view -F4 -Sb - > sample1.bam
export BOWTIE_INDEXES=/fdb/igenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieIndex/; \
  zcat sample2.fastq.gz | bowtie --strata --best --all -m1 --sam genome - | samtools view -F4 -Sb - > sample2.bam
export BOWTIE_INDEXES=/fdb/igenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieIndex/; \
  zcat sample3.fastq.gz | bowtie --strata --best --all -m1 --sam genome - | samtools view -F4 -Sb - > sample3.bam

Submit this job using the swarm command.

swarm -f bowtie1.swarm -g 4 -t 8 --module bowtie/1,samtools
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module bowtie1 Loads the bowtie1 module for each subjob in the swarm