High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
IVA on Biowulf & Helix

Description

Integrated virus assembler (IVA) is a de novo assembler designed for virus genomes without repeat sequences, using Illumina read pairs from mixed populations at extremely high and variable depth.

There are multiple versions of IVA available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail iva

To select a module use

module load iva/[version]

where [version] is the version of choice.

IVA is a multithreaded application. Make sure to match the number of cpus requested with the --threads option.

Environment variables set

Dependencies

IVA depends on a number of external tools. These are loaded automatically by the IVA module:

References

Web sites

On Helix

Run the test data set automatically (version >=1.0.2)

helix$ module load iva/1.0.2
helix$ iva --test --threads=2 --trimmomatic=$TRIMMOJAR test
unning iva in test mode...
Copied input test files into here: /data/user/iva/test
Current working directory: /data/user/iva/test
Running iva on the test data with the command:
/usr/local/apps/iva/1.0.2/bin/iva --threads 2 --pcr_primers hiv_pcr_primers.fa \
  --trimmomatic /usr/local/apps/trimmomatic/Trimmomatic-0.33/trimmomatic-0.33.jar \
  -f reads_1.fq.gz -r reads_2.fq.gz iva.out
Finished running iva
helix$ tree test
test
|-- [user    672]  hiv_pcr_primers.fa
|-- [user   9.7K]  iva_contigs_no_trimmomatic.fasta
|-- [user   9.0K]  iva_contigs_with_trimmomatic.fasta
|-- [user   4.0K]  iva.out
|   |-- [user    10K]  contigs.fasta
|   `-- [user    313]  info.txt
|-- [user   3.6M]  reads_1.fq.gz
|-- [user   4.4M]  reads_2.fq.gz
`-- [user   9.0K]  reference.fasta

Run IVA with trimmomatic on the data copied to test by the automated test

helix$ cd test
helix$ iva --threads 2 --pcr_primers hiv_pcr_primers.fa \
  -f reads_1.fq.gz -r reads_2.fq.gz \
  --trimmomatic $TRIMMOJAR \
  --ctg_first_trim 15 \
  iva.out2  
Batch job on Biowulf

Create a batch script similar to the following:

#! /bin/bash

function fail {
  echo >2 "$@"
  exit 1
}

module load iva || fail "could not load iva module"
iva --threads $SLURM_CPUS_PER_TASK \
  -f read1.fq.gz -r read2.fq.gz \
  --trimmomatic=$TRIMMOJAR \
  sample.iva || fail "iva return non-zero exit status"
Submit to the queue with sbatch:
b2$ sbatch --cpus-per-task 6 iva.sh
Swarm of jobs on Biowulf

Create a swarm command file similar to the following

iva -r sample1_r1.fq.gz -f sample1_r2.fq.gz --trimmomatic=$TRIMMOJAR --threads $SLURM_CPUS_PER_TASK sample1_out
iva -r sample2_r1.fq.gz -f sample2_r2.fq.gz --trimmomatic=$TRIMMOJAR --threads $SLURM_CPUS_PER_TASK sample2_out
iva -r sample3_r1.fq.gz -f sample3_r2.fq.gz --trimmomatic=$TRIMMOJAR --threads $SLURM_CPUS_PER_TASK sample3_out

And submit to the queue with swarm

b2$ swarm -f iva.swarm -t 4 -g 10 --module iva/1.0.2
Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described above

b2$ sinteractive
node$ module load iva
node$ iva -f read1.fq.gz -f read2.fq.gz iva_out