High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Freebayes on Biowulf & Helix

FreeBayes is a high-performance, flexible, and open-source Bayesian genetic variant detector. It operates on BAM alignment files, which are produced by most contemporary short-read aligners.

In addition to substantial performance improvements over its predecessors (PolyBayes, GigaBayes, and BamBayes), it expands the scope of SNP and small-indel variant calling to populations of individuals with heterogeneous copy number.

Freebayes is developed by Erik Garrison and Gabor Marth.

The examples below are modified based on the file: /usr/local/apps/freebayes/examples/pipeline.sh

Running on Helix

$ module load freebayes
$ cd /data/$USER/dir
$ freebayes \
	 --min-alternate-count 2 \
    --min-alternate-qsum 40 \
    --pvar 0.0001 \
    --use-mapping-quality \
    --posterior-integration-limits 1,3 \
    --genotype-variant-threshold 4 \
    --site-selection-max-iterations 3 \
    --genotyping-max-iterations 25 \
    --max-complex-gap 3 \
    --cnv-map YourCnvMapFile \
    --stdin \
    --region YourRegion \
    -f YourReferenceFile \
	| gzip > YourOutDir/YourRegion.vcf.gz

Running a single batch job on Biowulf

1. Create a script file. The file will contain the lines similar to the lines below.


module load freebayes
cd /data/$USER/dir
freebayes --fasta-reference h.sapiens.fasta infile.bam

2. Submit the script on biowulf:

$ sbatch jobscript

If more momory is required (default 4gb), specify --mem=Mg, for example

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; freebayes --fasta-reference h.sapiens.fasta infile.bam
  cd /data/$USER/dir2; freebayes --fasta-reference h.sapiens.fasta infile.bam
  cd /data/$USER/dir2; freebayes --fasta-reference h.sapiens.fasta infile.bam

Submit the swarm file, -f specify the swarmfile name, and module freebayes will be loaded for each command line in the file:

  $ swarm -f swarmfile --module freebayes

If more memory is needed for each line of commands, the below example allocate 10g for each command:

  $ swarm -f swarmfile -g 10 --module freebayes
For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load freebayes
cn999$ cd /data/$USER/dir
cn999$ freebayes --fasta-reference h.sapiens.fasta infile.bam

cn999$ exit


Make sure to exit the job once finished.

If more memory is needed, use --mem. For example

biowulf$ sinteractive --mem=8g