High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
WGSA on Biowulf

WGSA is an annotation pipeline for human genome re-sequencing studies, to facilitate the functional annotation step of whole genome sequencing (WGS). Currently WGSA supports the annotation of SNVs and indels locally without remote database requests, allowing it to scale up for large WGS studies.


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Two input files are required; the variants as either a tab-delimited file or a VCF, and a config file adapted from a template (configWGSA07.txt). Both files can be found in $WGSAHOME/examples. Sample session:

[user@biowulf]$ sinteractive --mem=100g --cpus-per-task=32 --time=24:00:00
salloc.exe: Pending job allocation 49231679
salloc.exe: job 49231679 queued and waiting for resources
salloc.exe: job 49231679 has been allocated resources
salloc.exe: Granted job allocation 49231679
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3205 are ready for job
[user@cn3205]$ module load WGSA
[+] Loading java 1.8.0_11 ...
[+] Loading WGSA 07 ...
[user@cn3205]$ cp $WGSAHOME/examples/* .
[user@cn3205]$ mkdir tmp work
[user@cn3205]$ java WGSA07 configWGSA07.txt -m 32 -t 8
Notice: Licenses are needed for commercial usage of software ANNOVAR and resources CADD, DANN, Polyphen2, REVEL and VEST3 (in dbNSFP).
Notice: WGSA pipeline is provided AS IS. No warranties.
Type "Understand" to proceed; "Exit" to exit
Preparing input files ...
Number of SNV:106 Number of indel:18
Pipeline written to configWGSA07.txt.sh
[user@cn3205]$ bash configWGSA07.txt.sh &> output.txt
[user@cn3205]$ exit
salloc.exe: Relinquishing job allocation 49231679

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. WGSA.sh), which uses the input file 'config.txt'. For example:

mkdir tmp work
module load WGSA
java WGSA07 config.txt -m $((SLURM_MEM_PER_NODE/1024)) -t ${SLURM_CPUS_PER_TASK} << EOF
bash config.txt.sh
rmdir tmp work

Submit this job using the Slurm sbatch command, along with appropriate amounts of memory, CPU, and time:

sbatch --cpus-per-task=32 --mem=100g --time=48:00:00 WGSA.sh