VIRTUS on Biowulf

VIRTUS is a bioinformatics pipeline for viral transcriptome detection and quantification considering splicing.

The HPC version do not support docker, but use singularity instead. Please do not install cwltool(3.1.20211014180718), since it can't work well with singularity CE.

Features

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --gres=lscratch:500 --cpus-per-task=40 --mem=40G
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load VIRTUS/1.2.1
[user@cn3144 ~]$ mkdir -p /data/$USER/VIRTUS; cd /data/$USER/VIRTUS
[user@cn3144 VIRTUS]$ cp "$VIRTUS_TEST_DATA"/* .
[user@cn3144 VIRTUS]$ VIRTUS_wrapper.py -h
usage: VIRTUS_wrapper.py [-h] [--VIRTUSDir VIRTUSDIR] --genomeDir_human
                         GENOMEDIR_HUMAN --genomeDir_virus GENOMEDIR_VIRUS
                         --salmon_index_human SALMON_INDEX_HUMAN
                         [--salmon_quantdir_human SALMON_QUANTDIR_HUMAN]
                         [--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN]
                         [--nthreads NTHREADS] [--hit_cutoff HIT_CUTOFF]
                         [-s SUFFIX_SE] [-s1 SUFFIX_PE_1] [-s2 SUFFIX_PE_2]
                         [--fastq]
                         input_path

positional arguments:
  input_path

optional arguments:
  -h, --help            show this help message and exit
  --VIRTUSDir VIRTUSDIR
  --genomeDir_human GENOMEDIR_HUMAN
  --genomeDir_virus GENOMEDIR_VIRUS
  --salmon_index_human SALMON_INDEX_HUMAN
  --salmon_quantdir_human SALMON_QUANTDIR_HUMAN
  --outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN
  --nthreads NTHREADS
  --hit_cutoff HIT_CUTOFF
  -s SUFFIX_SE, --Suffix_SE SUFFIX_SE
  -s1 SUFFIX_PE_1, --Suffix_PE_1 SUFFIX_PE_1
  -s2 SUFFIX_PE_2, --Suffix_PE_2 SUFFIX_PE_2
  --fastq

[user@cn3144 VIRTUS]$ cwltool --singularity --tmp-outdir-prefix=/lscratch/$SLURM_JOB_ID/ \
   --tmpdir-prefix=/lscratch/$SLURM_JOB_ID/ \
   $VIRTUS_WORKFLOW/VIRTUS.PE.cwl \
   --fastq1 ERR3240275_1.fastq.gz --fastq2 ERR3240275_2.fastq.gz \
   --genomeDir_human $VIRTUS_INDEX/STAR_index_human \
   --genomeDir_virus $VIRTUS_INDEX/STAR_index_virus \
   --salmon_index_human $VIRTUS_INDEX/salmon_index_human \
   --salmon_quantdir_human salmon_human \
   --nthreads 40


[user@cn3144 ]$ module load VIRTUS/2.0.1
[user@cn3144 ]$ VIRTUS_wrapper.py -h 
usage: VIRTUS_wrapper.py [-h] [--VIRTUSDir VIRTUSDIR] --genomeDir_human GENOMEDIR_HUMAN --genomeDir_virus GENOMEDIR_VIRUS
                         [--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN] [--nthreads NTHREADS] [-s SUFFIX_SE] [-s1 SUFFIX_PE_1] [-s2 SUFFIX_PE_2]
                         [--fastq] [--figsize FIGSIZE] [--th_cov TH_COV] [--th_rate TH_RATE]
                         input_path

positional arguments:
  input_path

optional arguments:
  -h, --help            show this help message and exit
  --VIRTUSDir VIRTUSDIR
  --genomeDir_human GENOMEDIR_HUMAN
  --genomeDir_virus GENOMEDIR_VIRUS
  --outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN
  --nthreads NTHREADS
  -s SUFFIX_SE, --Suffix_SE SUFFIX_SE
  -s1 SUFFIX_PE_1, --Suffix_PE_1 SUFFIX_PE_1
  -s2 SUFFIX_PE_2, --Suffix_PE_2 SUFFIX_PE_2
  --fastq
  --figsize FIGSIZE     (default:8,3)
  --th_cov TH_COV       threshold of max viral coverage to plot, test (default:10)
  --th_rate TH_RATE     threshold of max rate virus/human to plot, test (default:0.0001)

[user@cn3144 ]$ cwltool --singularity \
    --tmp-outdir-prefix=/lscratch/$SLURM_JOB_ID/ \
    --tmpdir-prefix=/lscratch/$SLURM_JOB_ID/ \
    $VIRTUS_WORKFLOW/VIRTUS.PE.cwl \
    --fastq1 ERR3240275_1.fastq.gz \
    --fastq2 ERR3240275_2.fastq.gz \
    --genomeDir_human $VIRTUS_INDEX/STAR_index_human \
    --genomeDir_virus $VIRTUS_INDEX/STAR_index_virus \
    --nthreads 40

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. VIRTUS.sh). For example:

#!/bin/bash
#SBATCH --job-name=S1_VIRTUS
#SBATCH --output=S1_VIRTUS.out
#SBATCH --ntasks=1
#SBATCH --gres=lscratch:500
#SBATCH --cpus-per-task=40
#SBATCH --mem=40Gb
#SBATCH --time=8:00:00
#SBATCH --partition=norm

set -e
module load VIRTUS/2
cp "$VIRTUS_TEST_DATA"/* /data/$USER/VIRTUS
cd /data/$USER/VIRTUS
VIRTUS_wrapper.py input.csv \ 
--genomeDir_human $VIRTUS_INDEX/STAR_index_human \
--genomeDir_virus $VIRTUS_INDEX/STAR_index_virus \
--nthreads 40

Submit the job:

sbatch VIRTUS.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.