Biowulf High Performance Computing at the NIH
bison on Biowulf

BISON is a bisulfite-converted short-read aligner that can natively utilize high-performance computing clusters to increase speed.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --ntasks=4 
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ tar xvf /usr/local/apps/bison/bison_tutorial.tar

[user@cn3144 ~]$ cd bison_tutorial

[user@cn3144 ~]$ module load bison
[+] Loading bison  0.4.0  on cn4242
[+] Loading openmpi 2.1.2  for GCC 4.8.5
[+] Loading bowtie  2-2.3.4
[+] Loading samtools 1.6  ...

[user@cn3144 ~]$ bison_index -q genomes/E.Coli
Output will be placed under genomes/E.Coli/bisulfite_genome
Reading in and converting genomes/E.Coli/Escherichia_coli.GCA_000597845.1.23.dna.genome.fa...
Now executing: bowtie2-build  -q genomes/E.Coli/bisulfite_genome/GA_conversion/genome.fa genomes/E.Coli/bisulfite_genome/GA_conversion/BS_GA
Now executing: bowtie2-build  -q genomes/E.Coli/bisulfite_genome/CT_conversion/genome.fa genomes/E.Coli/bisulfite_genome/CT_conversion/BS_CT
Removing genomes/E.Coli/bisulfite_genome/CT_conversion/genome.fa
Removing genomes/E.Coli/bisulfite_genome/GA_conversion/genome.fa

[user@cn3144 ~]$ time mpirun --mca mpi_warn_on_fork 0 -n 3  bison --directional \
      -g genomes/E.Coli -1 reads/100_1.fq.gz -2 reads/100_2.fq.gz
  cn4242 has rank 0
cn4242 has rank 1
cn4242 has rank 2
Allocating space for 3000000000 characters
Will C->T convert reads/100_1.fq.gz and store the results in reads/100_1.CT.fq.gz.
Will G->A convert reads/100_2.fq.gz and store the results in reads/100_2.GA.fq.gz.
reads/100_1.fq.gz contained 100000 reads
reads/100_2.fq.gz contained 100000 reads
Reading in genomes/E.Coli/Escherichia_coli.GCA_000597845.1.23.dna.genome.fa
Finished genomes/E.Coli/Escherichia_coli.GCA_000597845.1.23.dna.genome.fa
Alignment metrics will be printed to reads/100_1.txt
Sending start to node 1
Node 1 executing: bowtie2 -q --reorder  --score-min 'L,-0.6,-0.6' --norc -x genomes/E.Coli/bisulfite_genome/CT_conversion/BS_CT -1 reads/100_1.CT.fq.gz -2 reads/100_2.GA.fq.gz
Sending start to node 2
Node 2 executing: bowtie2 -q --reorder  --score-min 'L,-0.6,-0.6' --nofw -x genomes/E.Coli/bisulfite_genome/GA_conversion/BS_GA -1 reads/100_1.CT.fq.gz -2 reads/100_2.GA.fq.gz
Node 2 began sending reads @Tue Apr 24 16:47:32 2018
Node 1 began sending reads @Tue Apr 24 16:47:32 2018
Started slurping @Tue Apr 24 16:47:32 2018
100000 reads Tue Apr 24 16:47:39 2018
100000 reads; of these:
  100000 (100.00%) were paired; of these:
    53464 (53.46%) aligned concordantly 0 times
    44030 (44.03%) aligned concordantly exactly 1 time
    2506 (2.51%) aligned concordantly >1 times
    ----
    53464 pairs aligned concordantly 0 times; of these:
      4550 (8.51%) aligned discordantly 1 time
    ----
    48914 pairs aligned 0 times concordantly or discordantly; of these:
      97828 mates make up the pairs; of these:
        96925 (99.08%) aligned 0 times
        169 (0.17%) aligned exactly 1 time
        734 (0.75%) aligned >1 times
51.54% overall alignment rate
Node 1 finished sending reads @Tue Apr 24 16:47:47 2018
	(15.000000 sec elapsed)
Exiting worker node 1
Returning from worker node 1
100000 reads; of these:
  100000 (100.00%) were paired; of these:
    53748 (53.75%) aligned concordantly 0 times
    43760 (43.76%) aligned concordantly exactly 1 time
    2492 (2.49%) aligned concordantly >1 times
    ----
    53748 pairs aligned concordantly 0 times; of these:
      4509 (8.39%) aligned discordantly 1 time
    ----
    49239 pairs aligned 0 times concordantly or discordantly; of these:
      98478 mates make up the pairs; of these:
        97582 (99.09%) aligned 0 times
        171 (0.17%) aligned exactly 1 time
        725 (0.74%) aligned >1 times
51.21% overall alignment rate
Node 2 finished sending reads @Tue Apr 24 16:47:47 2018
	(15.000000 sec elapsed)
Finished slurping @Tue Apr 24 16:47:47 2018
	(15.000000 seconds elapsed)
Exiting worker node 2
Returning from worker node 2
200000 reads Tue Apr 24 16:47:47 2018
Closing input files
Alignment:
	200000 total reads analysed
	196718 reads mapped ( 98.36%).

	89173 concordant pairs
	9186 discordant pairs
	0 reads aligned as singletons

Number of hits aligning to each of the orientations:
	98712	 49.36%	OT (original top strand)
	98006	 49.00%	OB (original bottom strand)

Cytosine Methylation (N.B., statistics from overlapping mates are added together!):
	Number of C's in a CpG context: 1475324
	Percentage of methylated C's in a CpG context:  79.80%
	Number of C's in a CHG context: 1256935
	Percentage of methylated C's in a CHG context:   1.28%
	Number of C's in a CHH context: 2257498
	Percentage of methylated C's in a CHH context:   1.29%

real	0m19.724s
user	1m17.879s
sys	0m0.850s    

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. bison.sh). For example:

#!/bin/bash
set -e

module load bison

tar xf /usr/local/apps/bison/bison_tutorial.tar
cd bison_tutorial
bison_index -q genomes/E.Coli

# it's best to run one fewer MPI task than allocated cores
tasks=$(( $SLURM_NTASKS - 1))

mpirun --mca mpi_warn_on_fork 0 -n $tasks  bison --directional \
      -g genomes/E.Coli -1 reads/100_1.fq.gz -2 reads/100_2.fq.gz

Submit this job using the Slurm sbatch command.

sbatch --ntasks=4 --ntasks-per-core=1 bison.sh