Biowulf High Performance Computing at the NIH
Tandemtools on Biowulf

TandemTools package includes TandemQUAST tool for evaluating and improving assemblies of extra-long tandem repeats (ETR) and TandemMapper tool for mapping long error-prone reads to ETRs. Note: TandemTools is designed specifically for ETR (range in length from hundreds of thousands to millions of nucleotides). It is strongly not recommended to run TandemTools on shorter TRs.

Documentation

https://github.com/ablab/TandemTools

Important Notes
Submitting an interactive job

Allocate an interactive session and run the interactive job there.

[biowulf]$ sinteractive  --mem=10g  --cpus-per-task=4
salloc.exe: Granted job allocation 789523
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0135 are ready for job

[cn0135]$ cd /data/$USER/

[cn0135]$ module load tandemtools

[cn0135]$ cp -r /usr/local/apps/tandemtools/test_data .

[cn0135]$ cd test_data

[cn0135]$ $ tandemquast.py -t $SLURM_CPUS_PER_TASK --nano simulated_reads.fasta simulated_polished.fa -o simulated_res

[cn0135]$ exit
salloc.exe: Job allocation 789523 has been revoked.
[biowulf]$

Note: this job allocates 10 GB of memory and automatically assign the number of cpus allocated to the variable $SLURM_CPUS_PER_TASK.

Submitting a single batch job

1. Create a script file (myscript) similar to the one below

#! /bin/bash
# myscript
set -e

module load tandemtools || exit 1
cd /data/$USER/test_data/
tandemquast.py -t $SLURM_CPUS_PER_TASK --nano simulated_reads.fasta simulated_polished.fa -o simulated_res

2. Submit the script on biowulf:

[biowulf]$ sbatch --mem=10g --cpus-per-task=4 myscript

Using Swarm

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/$USER/cmdfile).

cd /data/$USER/dir1; tandemtools.py -t $SLURM_CPUS_PER_TASK...
cd /data/$USER/dir2; tandemtools.py -t $SLURM_CPUS_PER_TASK...
cd /data/$USER/dir3; tandemtools.py -t $SLURM_CPUS_PER_TASK...
...
cd /data/$USER/dir20; tandemtools.py -t $SLURM_CPUS_PER_TASK...

submit the swarm job:

$ swarm -f cmdfile --module tandemtools -g 10 -t 4

For more information regarding running swarm, see swarm.html