High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
vcf2maf

To convert a VCF into a MAF, each variant must be mapped to only one of all possible gene transcripts/isoforms that it might affect. This selection of a single effect per variant, is often subjective. So this project is an attempt to make the selection criteria smarter, reproducible, and more configurable. And the default criteria must lean towards best practices.

There are multiple versions of vcf2maf available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail vcf2maf

To select a module, type

module load vcf2maf/[ver]

where [ver] is the version of choice.

Environment variables set:

Special Considerations

vcf2maf.pl depends heavily on VEP. It also requires at least 20g of memory and minimally 4 cpus.

On Helix

Sample session:

module load vcf2maf VEP/86
vcf2maf.pl --input-vcf test.vcf --output-maf test.maf  --vep-path $VEP_HOME --vep-data $VEPCACHEDIR \
  --ref-fasta $VEPCACHEDIR/GRCh37.fa --filter-vcf $VEPCACHEDIR/ExAC.r0.3.sites.vep.vcf.gz \
  --vep-forks 8
Batch job on Biowulf

Create a batch input file (e.g. vcf2maf.sh), which uses the input file 'template.in'. For example:

#!/bin/bash
module load vcf2maf VEP/86
vcf2maf.pl --input-vcf test.vcf --output-maf test.maf --vep-path $VEP_HOME --vep-data $VEPCACHEDIR \
  --ref-fasta $VEPCACHEDIR/GRCh37.fa --filter-vcf $VEPCACHEDIR/ExAC.r0.3.sites.vep.vcf.gz --vep-forks 8

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=8 --mem=20g vcf2maf.sh
Swarm of Jobs on Biowulf

Create a swarmfile (e.g. vcf2maf.swarm). For example:

Submit this job using the swarm command.

swarm -f vcf2maf.swarm --module vcf2maf,VEP/86 -g 20 -t 8
Interactive job on Biowulf

After allocating an interactive session, see 'On Helix' above.

Documentation