High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Varscan on Biowulf & Helix

VarScan is a platform-independent, technology-independent software tool for identifying SNPs and indels in massively parallel sequencing of individual and pooled samples. Given data for a single sample, VarScan identifies and filters germline variants based on read counts, base quality, and allele frequency. Given data for a tumor-normal pair, VarScan also determines the somatic status of each variant (Germline, Somatic, or LOH) by comparing read counts between samples.

Running on Helix
$ module load varscan
$ java -Xmx4g -jar $VARSCANHOME/varscan.jar pileup2snp mypileup.file --min-coverage

Running a single batch job on Biowulf

1. Create a script file. The file will contain the lines similar to the lines below.

#!/bin/bash

module load varscan
cd /data/$USER/
java -Xmx4g -jar $VARSCANHOME/varscan.jar pileup2snp mypileup.file --min-coverage

2. Submit the script on biowulf:

$ sbatch jobscript

For more memory requirement (default 4gb), use --mem flag
and at the mean time, change -Xmx4g in your script to corresponding number (-Xmx4g to -Xmx10g in this example):

$ sbatch --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; java -Xmx4g -jar $VARSCANHOME/varscan.jar pileup2snp mypileup.file --min-coverage
  cd /data/$USER/dir2; java -Xmx4g -jar $VARSCANHOME/varscan.jar pileup2snp mypileup.file --min-coverage
  cd /data/$USER/dir3; java -Xmx4g -jar $VARSCANHOME/varscan.jar pileup2snp mypileup.file --min-coverage
	[......]
  

Submit the swarm file, -f specify the swarmfile name, and --module will be loaded the required module for each command line in the file:

  $ swarm -f swarmfile --module varscan

For more memory requirement (default 1.5gb for each line in swarm file), use -g flag
and at the mean time, change -Xmx4g in your script to corresponding number (-Xmx4g to -Xmx10g in this example):

  $ swarm -f swarmfile -g 10 --module varscan

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load varscan
cn999$ cd /data/$USER/dir
cn999$ java -Xmx4g -jar $VARSCANHOME/varscan.jar pileup2snp mypileup.file --min-coverage
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

For more memory requirement (default 4gb), use --mem flag
and at the mean time, change -Xmx4g in your script to corresponding number (-Xmx4g to -Xmx10g in this example): If more memory is needed, use --mem. For example

biowulf$ sinteractive --mem=10g

 

Documentation

http://varscan.sourceforge.net/using-varscan.html