Biowulf High Performance Computing at the NIH
Oncotator on Biowulf

Oncotator (http://www.broadinstitute.org/cancer/cga/oncotator) is a tool for annotating information onto genomic point mutations (SNPs/SNVs) and indels. It is primarily intended to be used on human genome variant callsets and only data sources that are relevant to cancer researchers are provided. However, the tool can technically be used to annotate any kind of information onto variant callsets from any organism, and there are instructions the Broad Institute web site on how to prepare custom data sources for inclusion in the process.

References:

Documentation
Important Notes

Transcript override lists

The Broad Institute highly recommends that you use one of the transcript override lists discussed below, especially with clinical applications of Oncotator. When running Oncotator, provide one of the UniProt Exact Match files with the -c parameter.
  1. UniProt Exact Match For GENCODE v19

    Gives selection priority to transcripts with protein sequences that match the UniProt protein sequence exactly.

    On Biowulf, the file can be found in the Oncotator database directory, at "$ONCOTATOR_DATASOURCE/tx_exact_uniprot_matches.txt".

  2. UniProt Exact Match + Clinical for GENCODE v19

    Gives priority to known clinical protein changes. The file can be found in the Oncotator database directory, at "$ONCOTATOR_DATASOURCE/tx_exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt".

    On biowulf, the file is a modification of the UniProt Exact Match For GENCODE v19. For more details, please see the powerpoint presentation tx_selection_results_LTLedits.pptx.

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load oncotator

[user@cn3144 ~]$ oncotator -v --no-multicore --db-dir /fdb/oncotator/oncotator_v1_ds_April052016 \
/usr/local/apps/oncotator/oncotator-1.9.7.0/test/testdata/maflite/Patient0.snp.maf.txt \
exampleOutput.tsv hg19

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. oncotator.sh). For example:

#!/bin/bash
module load oncotator
oncotator -v --no-multicore --db-dir /fdb/oncotator/oncotator_v1_ds_April052016 \
/usr/local/apps/oncotator/oncotator-1.9.7.0/test/testdata/maflite/Patient0.snp.maf.txt \
exampleOutput.tsv hg19

Submit this job using the Slurm sbatch command.

sbatch --mem=10g oncotator.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. oncotator.swarm). For example:

oncotator -v --no-multicore --db-dir=$ONCOTATOR_DATASOURCE   /path/to/file1.vcf  Output1.tsv   hg19
oncotator -v --no-multicore --db-dir=$ONCOTATOR_DATASOURCE   /path/to/file1.vcf  Output2.tsv   hg19
oncotator -v --no-multicore --db-dir=$ONCOTATOR_DATASOURCE   /path/to/file1.vcf  Output3.tsv   hg19

Submit this job using the swarm command.

swarm -f oncotator.swarm -g 10 --module oncotator
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
--module oncotator Loads the oncotator module for each subjob in the swarm