Biowulf High Performance Computing at the NIH
Transvar on Biowulf

TransVar is a versatile annotator for 3-way conversion and annotation among genomic characterization(s) of mutations (e.g., chr3:g.178936091G>A) and transcript-dependent annotation(s) (e.g., PIK3CA:p.E545K or PIK3CA:c.1633G>A, or NM_006218.2:p.E545K, or NP_006266.2:p.G240Afs*50). It is particularly designed with the functionality of resolving ambiguous mutation annotations arising from differential transcript usage. TransVar keeps awareness of the underlying unknown transcript structure (exon boundary, reference amino acid/base) while performing reverse annotation (via fuzzy matching from protein level to cDNA level). TransVar has the following features:

Note 1:

Transvar Hg19 files are located under /fdb/transvar

Note 2:

To avoid /home/$USER disk quota filled up, create a link to point '/home/$USER/.transvar.download' to '/data/$USER/transvar' first.

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load transvar

[user@cn3144 ~]$ transvar config --download_anno --refversion hg19

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Submitting a single batch job

1. Create a script file. The file will contain the lines similar to the lines below. Modify the path of program location before running.

#!/bin/bash 

module load transvar
cd /data/$USER/somewhere
transvar config --download_anno --refversion hg19
....
....

2. Submit the script on Biowulf.

$ sbatch myscript

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/$USER/cmdfile). Here is a sample file:

cd /data/user/run1/; transvar config --download_anno --refversion hg19
cd /data/user/run2/; transvar config --download_anno --refversion hg19
cd /data/user/run3/; transvar config --download_anno --refversion hg19
........

The -f flag is required to specify swarm file name.

Submit the swarm job:

$ swarm -f swarmfile --module transvar

- Use -g flag for more memory requirement (default 1.5gb per line in swarmfile)

For more information regarding running swarm, see swarm.html

 

Documentation

https://bitbucket.org/wanding/transvar