Biowulf High Performance Computing at the NIH
Gvcfgenotype on Biowulf

gvcfgenotyper is a utility for merging and genotyping strelka2 GVCFs. This tool provides basic genome VCF (GVCF) merging and genotyping functionality to provide a multisample BCF/VCF suitable for cohort analysis. Variants are normalised and decomposed on-the-fly before merging. Samples that do not have a particular variant have their homozygous reference confidence estimated from the GVCF depth blocks using some simple heuristics.

Documentation

https://github.com/Illumina/gvcfgenotyper

Important Notes
Submitting an interactive job

Allocate an interactive session and run the interactive job there.

[biowulf]$ sinteractive  --mem=5g
salloc.exe: Granted job allocation 789523
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0135 are ready for job

[cn0135]$ cd /data/$USER/

[cn0135]$ module load gvcfgenotyper

[cn0135]$ cp -r /usr/local/apps/gvcfgenotyper/test .

[cn0135]$ cd test

[cn0135]$ find . -name '*.tiny.vcf.gz' > gvcfs.txt

[cn0135]$ gvcfgenotyper -f tiny.ref.fa -l gvcfs.txt -Ob -o output.bcf

[cn0135]$ exit
salloc.exe: Job allocation 789523 has been revoked.
[biowulf]$

Submitting a single batch job

1. Create a script file (myscript) similar to the one below

#! /bin/bash
# myscript
set -e

module load gvcfgenotyper || exit 1
cd /data/$USER/test/
gvcfgenotyper -f tiny.ref.fa -l gvcfs.txt -Ob -o output.bcf

2. Submit the script on biowulf:

[biowulf]$ sbatch --mem=5g myscript

Using Swarm

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/$USER/cmdfile).

cd /data/$USER/dir1; gvcfgenotyper -f tiny.ref.fa -l gvcfs.txt -Ob -o output.bcf...
cd /data/$USER/dir2; gvcfgenotyper -f tiny.ref.fa -l gvcfs.txt -Ob -o output.bcf...
cd /data/$USER/dir3; gvcfgenotyper -f tiny.ref.fa -l gvcfs.txt -Ob -o output.bcf...
...
cd /data/$USER/dir20; gvcfgenotyper -f tiny.ref.fa -l gvcfs.txt -Ob -o output.bcf...

submit the swarm job:

$ swarm -f cmdfile --module gvcfgenotyper -g 5

For more information regarding running swarm, see swarm.html