vcflib on Biowulf
Description
vcflib is a C++ library for parsing Variant Call Format (VCF) files and a set of command line tools based on that library.
The following tools are currently available:
vcf2dag,
vcf2fasta,
vcf2tsv,
vcfaddinfo,
vcfafpath,
vcfallelicprimitives,
vcfaltcount,
vcfannotate,
vcfannotategenotypes,
vcfbreakmulti,
vcfcat,
vcfcheck,
vcfclassify,
vcfcleancomplex,
vcfcombine,
vcfcommonsamples,
vcfcountalleles,
vcfcreatemulti,
vcfdistance,
vcfecho,
vcfentropy,
vcfevenregions,
vcffilter,
vcffixup,
vcfflatten,
vcfgeno2alleles,
vcfgeno2haplo,
vcfgenosamplenames,
vcfgenosummarize,
vcfgenotypecompare,
vcfgenotypes,
vcfglbound,
vcfglxgt,
vcfhetcount,
vcfhethomratio,
vcfindex,
vcfinfo2qual,
vcfinfosummarize,
vcfintersect,
vcfkeepgeno,
vcfkeepinfo,
vcfkeepsamples,
vcfleftalign,
vcflength,
vcfnumalt,
vcfoverlay,
vcfparsealts,
vcfprimers,
vcfqual2info,
vcfrandom,
vcfrandomsample,
vcfremap,
vcfremoveaberrantgenotypes,
vcfremovesamples,
vcfroc,
vcfsample2info,
vcfsamplediff,
vcfsamplenames,
vcfsitesummarize,
vcfstats,
vcfstreamsort,
vcfuniq,
vcfuniqalleles
There may be multiple versions of vcflib available. An easy way of selecting the version is to use modules. To see the modules available, type
module avail vcflib
To select a module use
module load vcflib/[version]
where [version]
is the version of choice.
Environment variables set
-
$PATH
-
$CPATH
Web sites
Interactive job
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ vcf=/fdb/GATK_resource_bundle/hg19-2.8/CEUTrio.HiSeq.WGS.b37.bestPractices.hg19.vcf.gz [user@cn3144 ~]$ vcfsamplenames $vcf NA12878 NA12891 NA12892 [user@cn3144 ~]$ zcat $vcf | vcfcountalleles 12935193 [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Batch job on Biowulf
Create a batch script similar to the following example:
#! /bin/bash function fail() { echo "$@" >&2 exit 1 } rb=/fdb/GATK_resource_bundle/hg19-2.8 module load vcflib || fail "could not load vcflib module" module load samtools/1.2 || fail "could not load samtools module" tabix -h ${rb}/CEUTrio.HiSeq.WGS.b37.bestPractices.hg19.vcf.gz chr1:1-100000 \ | vcf2tsv > CEUTrio.outSubmit to the queue with sbatch:
b2$ sbatch vcf2tsv.sh
Swarm of jobs on Biowulf
Create a swarm command file similar to the following example:
vcfannotate -b enhancers.bed -k enh sample1.vcf > sample1_anno.vcf vcfannotate -b enhancers.bed -k enh sample2.vcf > sample2_anno.vcf
And submit to the queue with swarm
b2$ swarm -f vcfannotate.swarm -g 5