High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Bsmap on Biowulf & Helix

BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leave methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference.

RRBSMAP is a specifically designed version of BSMAP for reduced representation bisulfite sequencing (RRBS), it indexes the genome only on the enzyme digestion sites and therefore guarantees all reads were mapped to digestion sites, and greatly reduces the CPU/memory usage. Since BSMAP-2.0, RRBSMAP has been merged into BSMAP.

Running on Helix

$ module load bsmap
$ cd /data/$USER/dir
$ bsmap 
Usage: bsmap [options]
-a <str> query a file, FASTA/FASTQ/BAM format
-d <str> reference sequences file, FASTA format
-o <str> output alignment file, BSP/SAM/BAM format, if omitted, the output will be written to STDOUT in SAM format.
-p <int> number of processors to use, default=8 $ bsmap -a infile -d ref.fa -o out.bam -p 4

Running a single batch job on Biowulf

1. Create a script file. The file will contain the lines similar to the lines below.

#!/bin/bash


module load bsmap
cd /data/$USER/dir
bsmap -a infile -d ref.fa -o out.bam -p $SLURM_CPUS_PER_TASK

2. Submit the script on biowulf:

$ sbatch --cpus-per-task=8 jobscript

--cpus-per-task : required here since bsmap default to use 8 threads.

$ sbatch --cpus-per-task=8 --mem=20g jobscript

--mem : allocate more memory than default (8x2g=16g)

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; bsmap -a infile -d ref.fa -o out.bam -p $SLURM_CPUS_PER_TASK
  cd /data/$USER/dir2; bsmap -a infile -d ref.fa -o out.bam -p $SLURM_CPUS_PER_TASK
  cd /data/$USER/dir3; bsmap -a infile -d ref.fa -o out.bam -p $SLURM_CPUS_PER_TASK
	[......]
  

Submit the swarm file:

  $ swarm -t 8 -f swarmfile --module bsmap

-t: threads used, bsmap use 8 by default
-f: specify the swarmfile name
--module: loaded the required module for each command line in the file

To allocate more memory:

  $ swarm -t 8 -f swarmfile -g 20 --module bsmap

-g: allocate more memory

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive --cpus-per-task=8 
salloc.exe: Granted job allocation 16535

cn999$ module load bsmap
cn999$ cd /data/$USER/dir
cn999$ bsmap -a infile -d ref.fa -o out.bam -p $SLURM_CPUS_PER_TASK
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem. For example

biowulf$ sinteractive --cpus-per-task=8 --mem=20g

Documentation

https://code.google.com/p/bsmap/