Biowulf High Performance Computing at the NIH
VIRTUS on Biowulf

VIRTUS is a bioinformatics pipeline for viral transcriptome detection and quantification considering splicing. The HPC version do not support docker, but use singularity instead.



Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --gres=lscratch:500 --cpus-per-task=40 --mem=40G
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load VIRTUS
[user@cn3144 ~]$ mkdir -p /data/$USER/VIRTUS; cd /data/$USER/VIRTUS
[user@cn3144 VIRTUS]$ cp "$VIRTUS_TEST_DATA"/* .
[user@cn3144 VIRTUS]$ -h
usage: [-h] [--VIRTUSDir VIRTUSDIR] --genomeDir_human
                         GENOMEDIR_HUMAN --genomeDir_virus GENOMEDIR_VIRUS
                         --salmon_index_human SALMON_INDEX_HUMAN
                         [--salmon_quantdir_human SALMON_QUANTDIR_HUMAN]
                         [--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN]
                         [--nthreads NTHREADS] [--hit_cutoff HIT_CUTOFF]
                         [-s SUFFIX_SE] [-s1 SUFFIX_PE_1] [-s2 SUFFIX_PE_2]

positional arguments:

optional arguments:
  -h, --help            show this help message and exit
  --genomeDir_human GENOMEDIR_HUMAN
  --genomeDir_virus GENOMEDIR_VIRUS
  --salmon_index_human SALMON_INDEX_HUMAN
  --salmon_quantdir_human SALMON_QUANTDIR_HUMAN
  --outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN
  --nthreads NTHREADS
  --hit_cutoff HIT_CUTOFF
  -s1 SUFFIX_PE_1, --Suffix_PE_1 SUFFIX_PE_1
  -s2 SUFFIX_PE_2, --Suffix_PE_2 SUFFIX_PE_2

[user@cn3144 VIRTUS]$ cwltool --singularity --tmp-outdir-prefix=/lscratch/$SLURM_JOB_ID/ \
--tmpdir-prefix=/lscratch/$SLURM_JOB_ID/ \
--fastq1 ERR3240275_1.fastq.gz --fastq2 ERR3240275_2.fastq.gz \
--genomeDir_human $VIRTUS_INDEX/STAR_index_human \
--genomeDir_virus $VIRTUS_INDEX/STAR_index_virus \
--salmon_index_human $VIRTUS_INDEX/salmon_index_human \
--salmon_quantdir_human salmon_human \
--nthreads 40

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:

#SBATCH --job-name=S1_VIRTUS
#SBATCH --output=S1_VIRTUS.out
#SBATCH --ntasks=1
#SBATCH --gres=lscratch:500
#SBATCH --cpus-per-task=40
#SBATCH --mem=40Gb
#SBATCH --time=8:00:00
#SBATCH --partition=norm

set -e
module load VIRTUS
cd /data/$USER/VIRTUS input.csv \ 
--genomeDir_human $VIRTUS_INDEX/STAR_index_human \
--genomeDir_virus $VIRTUS_INDEX/STAR_index_virus \
--salmon_index_human $VIRTUS_INDEX/salmon_index_human \
--salmon_quantdir_human salmon_human \
--nthreads 40

Submit the job:

Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.