High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
pVAC-Seq on Biowulf2 & Helix

pVAC-Seq offers epitope binding predictions for missense, inframe indel, and frameshift mutations.

pVAC-Seq was developed in the Griffith Lab at Washington University St Louis. [pVAC-Seq website]

On Helix

The input to the pVAC-Seq pipeline is a VEP annotated single-sample VCF. In addition to the standard VEP annotations, pVAC-Seq also requires the annotations provided by the Downstream and Wildtype VEP plugins Create the VCF file as described in the documentation, and then run pVAC-Seq. Sample sesssion:

[user@helix ~]$ module load VEP pvacseq

[user@helix ~]$ variant_effect_predictor.pl --offline --cache --dir_cache $VEPCACHEDIR --input_file test.vcf --format vcf --vcf --symbol --plugin Downstream --plugin Wildtype --terms SO  --output_file out.vcf --assembly GRCh38
2016-11-29 15:02:55 - Read existing cache info
2016-11-29 15:02:55 - Loaded plugin: Downstream
2016-11-29 15:02:55 - Loaded plugin: Wildtype
2016-11-29 15:02:55 - Starting...
2016-11-29 15:02:56 - Read 5000 variants into buffer
2016-11-29 15:02:56 - Reading transcript data from cache and/or database
[====================================================================================================================================================================]  [ 100% ]
2016-11-29 15:02:56 - Retrieved 1230 transcripts (0 mem, 1278 cached, 0 DB, 48 duplicates)
[...]

[user@helix ~]$ pvacseq run -e 11 --iedb-install-directory /usr/local/apps/pvacseq/ out.vcf Test HLA-C*07:02  NNalign NetMHC NetMHCIIpan NetMHCcons `pwd`
   
Batch job on Biowulf

Sample batch script:

#! /bin/bash
# 
set -e

#!/bin/bash

cd /data/susanc/pvacseq

module load pvacseq
allele="HLA-A*23:01,HLA-A*68:02,HLA-B*07:17,HLA-B*08:01,HLA-C*02:02,HLA-C*17:01"
pvacseq run -e 8,9,10,11 --fasta-size=50 final.vcf Test ${allele} \
    {NNalign,NetMHC,NetMHCIIpan,NetMHCcons,NetMHCpan,PickPocket,SMM,SMMPMBEC,SMMalign} outdir

The batch file is submitted to the queue with a command similar to the following:

biowulf$ sbatch  myscript

pvacseq is a single-threaded program. There is no point in allocating more than the default 2 CPUs. If the job requires more than the default 4 GB of memory, you can specify the memory required with

sbatch --mem=#g  myscript

where # = number of GB of memory.

Swarm of jobs on Biowulf2

To set up a swarm of jobs, each running the subjobs in local mode, use a swarm file like this:

pvacseq run -e 11 --iedb-install-directory /usr/local/apps/pvacseq/ out1.vcf Test1 HLA-C*07:02  \
       {NNalign,NetMHC,NetMHCIIpan,NetMHCcons,NetMHCpan,PickPocket,SMM,SMMPMBEC,SMMalign} out1
pvacseq run -e 11 --iedb-install-directory /usr/local/apps/pvacseq/ out2.vcf Test2 HLA-C*07:02  \
       {NNalign,NetMHC,NetMHCIIpan,NetMHCcons,NetMHCpan,PickPocket,SMM,SMMPMBEC,SMMalign} out2
[...]

Then submit the swarm, requesting # memory for each task

biowulf$ swarm -g 3  swarmfile  --module pvacseq

See the swarm documentation for more details.

Interactive job on Biowulf

Allocate an interactive session and run pvacseq on there. Sample session:

biowulf$ sinteractive  --mem=5G
salloc.exe: Granted job allocation 240602
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0044 are ready for job

cn0044$ module load VEP pvacseq

cn0044$ variant_effect_predictor.pl --offline --cache --dir_cache $VEPCACHEDIR --input_file test.vcf --format vcf \
     --vcf --symbol --plugin Downstream --plugin Wildtype --terms SO  --output_file out.vcf --assembly GRCh38

cn0044$ pvacseq run -e 11 --iedb-install-directory /usr/local/apps/pvacseq/ out.vcf Test HLA-C*07:02 \
      {NNalign,NetMHC,NetMHCIIpan,NetMHCcons,NetMHCpan,PickPocket,SMM,SMMPMBEC,SMMalign}`pwd`

[...]
cn0044$ exit
salloc.exe: Relinquishing job allocation 240602
biowulf$
Documentation

pVAC-Seq documentation