vcf2maf on Biowulf

To convert a VCF into a MAF, each variant must be mapped to only one of all possible gene transcripts/isoforms that it might affect. This selection of a single effect per variant, is often subjective. So this project is an attempt to make the selection criteria smarter, reproducible, and more configurable. And the default criteria must lean towards best practices.

Important Notes depends heavily on VEP. It also requires at least 20g of memory and minimally 4 cpus.

Interactive job
Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --cpus-per-task=8 --mem=20g
[user@cn3144 ~]$ module load VEP/92 vcf2maf
[user@cn3144 ~]$ --input-vcf test.vcf --output-maf test.maf  --vep-path $VEP_HOME \
  --vep-data $VEP_CACHEDIR --ref-fasta $VEP_CACHEDIR/GRCh37.fa \
  --filter-vcf $VEP_CACHEDIR/ExAC.r0.3.sites.vep.vcf.gz \
  --vep-forks 8

[user@cn3144 ~]$ exit
[user@biowulf ~]$

Batch job
Create a batch input file (e.g. For example:

module load VEP/92 vcf2maf --input-vcf test.vcf --output-maf test.maf --vep-path $VEP_HOME --vep-data $VEP_CACHEDIR \
  --ref-fasta $VEP_CACHEDIR/GRCh37.fa --filter-vcf $VEP_CACHEDIR/ExAC.r0.3.sites.vep.vcf.gz --vep-forks $SLURM_CPUS_PER_TASK

sbatch [--cpus-per-task=#] [--mem=#]
Swarm of Jobs
Create a swarmfile (e.g. vcf2maf.swarm). For example: --input-vcf data/test1.vcf --output-maf data/test1.maf --vep-path $VEP_HOME --vep-data $VEP_CACHEDIR --ref-fasta $VEP_CACHEDIR/GRCh37.fa --filter-vcf $VEP_CACHEDIR/ExAC.r0.3.sites.vep.vcf.gz --vep-forks $SLURM_CPUS_PER_TASK --input-vcf data/test2.vcf --output-maf data/test2.maf --vep-path $VEP_HOME --vep-data $VEP_CACHEDIR --ref-fasta $VEP_CACHEDIR/GRCh37.fa --filter-vcf $VEP_CACHEDIR/ExAC.r0.3.sites.vep.vcf.gz --vep-forks $SLURM_CPUS_PER_TASK --input-vcf data/test3.vcf --output-maf data/test3.maf --vep-path $VEP_HOME --vep-data $VEP_CACHEDIR --ref-fasta $VEP_CACHEDIR/GRCh37.fa --filter-vcf $VEP_CACHEDIR/ExAC.r0.3.sites.vep.vcf.gz --vep-forks $SLURM_CPUS_PER_TASK --input-vcf data/test4.vcf --output-maf data/test4.maf --vep-path $VEP_HOME --vep-data $VEP_CACHEDIR --ref-fasta $VEP_CACHEDIR/GRCh37.fa --filter-vcf $VEP_CACHEDIR/ExAC.r0.3.sites.vep.vcf.gz --vep-forks $SLURM_CPUS_PER_TASK

swarm -f vcf2maf.swarm [-g #] [-t #] --module vcf2maf
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module vcf2maf Loads the vcf2maf module for each subjob in the swarm