Biowulf High Performance Computing at the NIH
ARIBA on Biowulf

ARIBA is a tool that identifies antibiotic resistance genes by running local assemblies. The input is a FASTA file of reference sequences (can be a mix of genes and noncoding sequences) and paired sequencing reads. ARIBA reports which of the reference sequences were found, plus detailed information on the quality of the assemblies and any variants between the sequencing reads and the reference sequences.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load ariba
[+] Loading singularity  3.4.2  on cn3103 
[+] Loading ariba 2.14.4  ... 

[user@cn3144 ~]$ cd /data/user/ARIBA_TEST

[user@cn3144 ~]$ ls
DATASETS

[user@cn3144 ~]$ ariba pubmlstspecies
Achromobacter spp.
Acinetobacter baumannii#1
Acinetobacter baumannii#2
Aeromonas spp.
[...]
Xylella fastidiosa
Yersinia pseudotuberculosis
Yersinia ruckeri
Yersinia spp.

[user@cn3144 ~]$ ariba pubmlstget "Staphylococcus aureus" get_mlst
WARNING: Median sequence length is 456 but arcC.548 has length 522 which is too long or short. Removing.
WARNING: Median sequence length is 456 but arcC.567 has length 516 which is too long or short. Removing.
ariba db directory prepared. You can use it like this:
ariba run get_mlst/ref_db reads_1.fq reads_2.fq output_directory

[user@cn3144 ~]$ ls
DATASETS  get_mlst

[user@cn3144 ~]$ ls get_mlst
clusters.tsv  pubmlst_download	ref_db

[user@cn3144 ~]$ ariba run get_mlst/ref_db DATASETS/SA_reads_1.fastq DATASETS/SA_reads_2.fastq ariba_out

[user@cn3144 ~]$ ls
ariba_out  DATASETS  get_mlst

[user@cn3144 ~]$ ls ariba.out
ariba.tmp.6nvqjvdt     assembled_seqs.fa.gz  debug.report.tsv  mlst_report.details.tsv	report.tsv
assembled_genes.fa.gz  assemblies.fa.gz      log.clusters.gz   mlst_report.tsv		version_info.txt

[user@cn3144 ~]$ exit

salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. ariba.sh). For example:

#!/bin/bash
set -e
module load ariba
ariba run /data/user/ARIBA_TEST/get_mlst/ref_db \
          /data/user/ARIBA_TEST/DATASETS/SA_reads_1.fastq \
          /data/user/ARIBA_TEST/DATASETS/SA_reads_2.fastq \
          /data/user/ARIBA_TEST/ariba_out

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] ariba.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. ariba.swarm). For example:

ariba run /data/user/ARIBA_TEST/get_mlst/ref_db \
          /data/user/ARIBA_TEST/DATASETS/SA_a_reads_1.fastq \
          /data/user/ARIBA_TEST/DATASETS/SA_a_reads_2.fastq \
          /data/user/ARIBA_TEST/ariba_out_a
ariba run /data/user/ARIBA_TEST/get_mlst/ref_db \
          /data/user/ARIBA_TEST/DATASETS/SA_b_reads_1.fastq \
          /data/user/ARIBA_TEST/DATASETS/SA_b_reads_2.fastq \
          /data/user/ARIBA_TEST/ariba_out_b

Submit this job using the swarm command.

swarm -f ariba.swarm [-g #] [-t #] --module ariba
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module ariba Loads the ariba module for each subjob in the swarm