High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Vcftools on Biowulf & Helix

VCFtools contains a Perl API (Vcf.pm) and a number of Perl scripts that can be used to perform common tasks with VCF files such as file validation, file merging, intersecting, complements, etc. The Perl tools support all versions of the VCF specification (3.2, 3.3, and 4.0), nevertheless, the users are encouraged to use the latest version VCFv4.0. The VCFtools in general have been used mainly with diploid data, but the Perl tools aim to support polyploid data as well.

VCFTools is maintained and developed by Adam Auton, Peter Danecek and collaborators. VCFTools paper.

Running Vcftools on Helix

Sample session:

helix$ module load vcftools

helix$ module list

Currently Loaded Modules:
1) vcftools/0.1.12b helix$ compare-vcf inputFile1 inputFile2

Running a single batch job on Biowulf

Set up a batch script along the following lines.


cd /data/$USER/mydir
module load vcftools

compare-vcf inputFile1 inputFile2

Submit to the batch system with:

$ sbatch myscript

For more memory requirement (default 4gb), use --mem flag:

$ sbatch --mem=10g myscript


Running a swarm of batch jobs on Biowulf

Set up a swarm command file (eg /data/$USER/cmdfile). Here is a sample file:

cd /data/$USER/mydir1; compare-vcf inputFile1 inputFile2
cd /data/$USER/mydir2; compare-vcf inputFile1 inputFile2
cd /data/$USER/mydir3; compare-vcf inputFile1 inputFile2

Submit this job with

$ swarm -f cmdfile --module vcftools

-f : swarm file name
--module: module to setup the environment variables before submitting the job

For more memory requirement (default 4gb), use -g flag:

$ swarm -g 10 -f cmdfile --module vcftools


Running an interactive job on Biowulf

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf]$ sinteractive
      salloc.exe: Granted job allocation 1528
[user@pXXXX]$ cd /data/$USER/myruns

[user@pXXXX]$ module load vcftools

[user@pXXXX]$ vcftools command

[user@pXXXX] exit