High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
BFC on Biowulf & Helix

BFC is a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data. It is specifically designed for high-coverage whole-genome human data, though also performs well for small genomes.

The BFC algorithm is a variant of the classical spectrum alignment algorithm introduced by Pevzner et al (2001). It uses an exhaustive search to find a k-mer path through a read that minimizes a heuristic objective function jointly considering penalties on correction, quality and k-mer support. This algorithm was first implemented in my fermi assembler and then refined a few times in fermi, fermi2 and now in BFC. In the k-mer counting phase, BFC uses a blocked bloom filter to filter out most singleton k-mers and keeps the rest in a hash table (Melsted and Pritchard, 2011). The use of bloom filter is how BFC is named, though other correctors such as Lighter and Bless actually rely more on bloom filter than BFC.

 

Running on Helix

Sample session:

helix$ module load bfc

helix$ bfc -s 3g -t16 reads.fq.gz | gzip -1 > corrected.fq.gz
Running a single batch job on Biowulf

Set up a batch script along the following lines.

#!/bin/bash 

cd /data/$USER/mydir
module load bfc
bfc -s 3g -t16 reads.fq.gz | gzip -1 > corrected.fq.gz
    

Submit to the batch system with:

$ sbatch myscript

 

Running a swarm of jobs on Biowulf

Set up a swarm command file (eg /data/$USER/cmdfile). Here is a sample file:

cd /data/$USER/mydir1; bfc -s 3g -t16 reads.fq.gz | gzip -1 > corrected.fq.gz
cd /data/$USER/mydir2; bfc -s 3g -t16 reads.fq.gz | gzip -1 > corrected.fq.gz
cd /data/$USER/mydir3; bfc -s 3g -t16 reads.fq.gz | gzip -1 > corrected.fq.gz
[...]   

Submit this job with

$ swarm -f cmdfile --module bfc

 

Running an interactive job on Biowulf

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf]$ sinteractive 

[user@pXXXX]$ cd /data/$USER/myruns

[user@pXXXX]$ module load bfc

[user@pXXXX]$ bfc -s 3g -t16 reads.fq.gz | gzip -1 > corrected.fq.gz

[user@pXXXX]$ exit

[user@biowulf]$ 

Documentation

https://github.com/lh3/bfc