VCFtools contains a Perl API (Vcf.pm) and a number of Perl scripts that can be used to perform common tasks with VCF files such as file validation, file merging, intersecting, complements, etc. The Perl tools support all versions of the VCF specification (3.2, 3.3, and 4.0), nevertheless, the users are encouraged to use the latest version VCFv4.0. The VCFtools in general have been used mainly with diploid data, but the Perl tools aim to support polyploid data as well.
VCFTools is maintained and developed by Adam Auton, Peter Danecek and collaborators.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load vcftools [user@cn3144 ~]$ compare-vcf inputFile1 inputFile2 [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. vcftools.sh). For example:
#!/bin/bash set -e module load vcftools compare-vcf inputFile1 inputFile2
Submit this job using the Slurm sbatch command.
sbatch --mem=10g vcftools.sh
Create a swarmfile (e.g. vcftools.swarm). For example:
cd dir1;compare-vcf inputFile1 inputFile2 cd dir2;compare-vcf inputFile1 inputFile2 cd dir3;compare-vcf inputFile1 inputFile2 cd dir4;compare-vcf inputFile1 inputFile2 cd dir5;compare-vcf inputFile1 inputFile2
Submit this job using the swarm command.
swarm -f vcftools.swarm -g 10 --module vcftoolswhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module vcftools | Loads the fastqc module for each subjob in the swarm |