Oncotator on Biowulf

Quick Links

Oncotator (http://www.broadinstitute.org/cancer/cga/oncotator) is a tool for annotating information onto genomic point mutations (SNPs/SNVs) and indels. It is primarily intended to be used on human genome variant callsets and only data sources that are relevant to cancer researchers are provided. However, the tool can technically be used to annotate any kind of information onto variant callsets from any organism, and there are instructions the Broad Institute web site on how to prepare custom data sources for inclusion in the process.

References:

Ramos et al.Oncotator: Cancer Variant Annotation Tool. 2015 Link

Documentation

oncotator Main Site
Type oncotator -h to get a summary of the Oncotator usage instructions.
Oncotator: O verview and Basic Usage
Sources for Oncotator An notation Data
Oncotator FAQ
Genome Analysis Toolkit (GATK) (not restricted to oncotator)
GATK Support Forum

Important Notes

Module Name: oncotator (see the modules page for more information)
Singlethreaded app
environment variables set
- $ONCOTATOR_DATASOURCE: points to latest Oncotator database bundle, e.g. /fdb/oncotator/oncotator_v1_ds_April052016
- $ONCOTATOR_TESTDATA: points to Oncotator test files and data which is used in the examples on this page.
Example files in $ONCOTATOR_TESTDATA
Reference data in /fdb/oncotator/

Transcript override lists

The Broad Institute highly recommends that you use one of the transcript override lists discussed below, especially with clinical applications of Oncotator. When running Oncotator, provide one of the UniProt Exact Match files with the -c parameter.

UniProt Exact Match For GENCODE v19
Gives selection priority to transcripts with protein sequences that match the UniProt protein sequence exactly.
On Biowulf, the file can be found in the Oncotator database directory, at "$ONCOTATOR_DATASOURCE/tx_exact_uniprot_matches.txt".

UniProt Exact Match + Clinical for GENCODE v19
Gives priority to known clinical protein changes. The file can be found in the Oncotator database directory, at "$ONCOTATOR_DATASOURCE/tx_exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt".
On biowulf, the file is a modification of the UniProt Exact Match For GENCODE v19. For more details, please see the powerpoint presentation tx_selection_results_LTLedits.pptx.

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load oncotator

[user@cn3144 ~]$ oncotator -v --no-multicore --db-dir /fdb/oncotator/oncotator_v1_ds_April052016 \
/usr/local/apps/oncotator/oncotator-1.9.7.0/test/testdata/maflite/Patient0.snp.maf.txt \
exampleOutput.tsv hg19

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. oncotator.sh). For example:

#!/bin/bash
module load oncotator
oncotator -v --no-multicore --db-dir /fdb/oncotator/oncotator_v1_ds_April052016 \
/usr/local/apps/oncotator/oncotator-1.9.7.0/test/testdata/maflite/Patient0.snp.maf.txt \
exampleOutput.tsv hg19

Submit this job using the Slurm sbatch command.

sbatch --mem=10g oncotator.sh

Swarm of Jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. oncotator.swarm). For example:

oncotator -v --no-multicore --db-dir=$ONCOTATOR_DATASOURCE   /path/to/file1.vcf  Output1.tsv   hg19
oncotator -v --no-multicore --db-dir=$ONCOTATOR_DATASOURCE   /path/to/file1.vcf  Output2.tsv   hg19
oncotator -v --no-multicore --db-dir=$ONCOTATOR_DATASOURCE   /path/to/file1.vcf  Output3.tsv   hg19

Submit this job using the swarm command.

swarm -f oncotator.swarm -g 10 --module oncotator

where

`-g #`	Number of Gigabytes of memory required for each process (1 line in the swarm command file)
`--module oncotator`	Loads the oncotator module for each subjob in the swarm