Oncotator (http://www.broadinstitute.org/cancer/cga/oncotator) is a tool for annotating information onto genomic point mutations (SNPs/SNVs) and indels. It is primarily intended to be used on human genome variant callsets and only data sources that are relevant to cancer researchers are provided. However, the tool can technically be used to annotate any kind of information onto variant callsets from any organism, and there are instructions the Broad Institute web site on how to prepare custom data sources for inclusion in the process.
Gives selection priority to transcripts with protein sequences that match the UniProt protein sequence exactly.
On Biowulf, the file can be found in the Oncotator database directory, at "$ONCOTATOR_DATASOURCE/tx_exact_uniprot_matches.txt".
Gives priority to known clinical protein changes. The file can be found in the Oncotator database directory, at "$ONCOTATOR_DATASOURCE/tx_exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt".
On biowulf, the file is a modification of the UniProt Exact Match For GENCODE v19. For more details, please see the powerpoint presentation tx_selection_results_LTLedits.pptx.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load oncotator [user@cn3144 ~]$ oncotator -v --no-multicore --db-dir /fdb/oncotator/oncotator_v1_ds_April052016 \ /usr/local/apps/oncotator/oncotator-1.9.7.0/test/testdata/maflite/Patient0.snp.maf.txt \ exampleOutput.tsv hg19 [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. oncotator.sh). For example:
#!/bin/bash module load oncotator oncotator -v --no-multicore --db-dir /fdb/oncotator/oncotator_v1_ds_April052016 \ /usr/local/apps/oncotator/oncotator-1.9.7.0/test/testdata/maflite/Patient0.snp.maf.txt \ exampleOutput.tsv hg19
Submit this job using the Slurm sbatch command.
sbatch --mem=10g oncotator.sh
Create a swarmfile (e.g. oncotator.swarm). For example:
oncotator -v --no-multicore --db-dir=$ONCOTATOR_DATASOURCE /path/to/file1.vcf Output1.tsv hg19 oncotator -v --no-multicore --db-dir=$ONCOTATOR_DATASOURCE /path/to/file1.vcf Output2.tsv hg19 oncotator -v --no-multicore --db-dir=$ONCOTATOR_DATASOURCE /path/to/file1.vcf Output3.tsv hg19
Submit this job using the swarm command.
swarm -f oncotator.swarm -g 10 --module oncotatorwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
--module oncotator | Loads the oncotator module for each subjob in the swarm |