High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Transvar on Biowulf and Helix

TransVar is a versatile annotator for 3-way conversion and annotation among genomic characterization(s) of mutations (e.g., chr3:g.178936091G>A) and transcript-dependent annotation(s) (e.g., PIK3CA:p.E545K or PIK3CA:c.1633G>A, or NM_006218.2:p.E545K, or NP_006266.2:p.G240Afs*50). It is particularly designed with the functionality of resolving ambiguous mutation annotations arising from differential transcript usage. TransVar keeps awareness of the underlying unknown transcript structure (exon boundary, reference amino acid/base) while performing reverse annotation (via fuzzy matching from protein level to cDNA level). TransVar has the following features:

Note 1:

Transvar Hg19 files are located under /fdb/transvar

Note 2:

To avoid /home/$USER disk quota filled up, create a link to point '/home/$USER/.transvar.download' to '/data/$USER/transvar' first.

Running on Helix

Sample session:

helix$ module load transvar
helix$ transvar config --download_anno --refversion hg19

Submitting a single batch job

1. Create a script file. The file will contain the lines similar to the lines below. Modify the path of program location before running.

#!/bin/bash 

module load transvar
cd /data/$USER/somewhere
transvar config --download_anno --refversion hg19
....
....

2. Submit the script on Biowulf.

$ sbatch myscript

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/$USER/cmdfile). Here is a sample file:

cd /data/user/run1/; transvar config --download_anno --refversion hg19
cd /data/user/run2/; transvar config --download_anno --refversion hg19
cd /data/user/run3/; transvar config --download_anno --refversion hg19
........

The -f flag is required to specify swarm file name.

Submit the swarm job:

$ swarm -f swarmfile --module transvar

- Use -g flag for more memory requirement (default 1.5gb per line in swarmfile)

For more information regarding running swarm, see swarm.html

 

Running an interactive job

User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf]$ sinteractive 

[user@pXXXX]$ cd /data/$USER/myruns

[user@pXXXX]$ module load transvar

[user@pXXXX]$ transvar config --download_anno --refversion hg19
[user@pXXXX] exit
slurm stepepilog here!
                   
[user@biowulf]$ 

Documentation

https://bitbucket.org/wanding/transvar