Tandemtools on Biowulf

TandemTools package includes TandemQUAST tool for evaluating and improving assemblies of extra-long tandem repeats (ETR) and TandemMapper tool for mapping long error-prone reads to ETRs. Note: TandemTools is designed specifically for ETR (range in length from hundreds of thousands to millions of nucleotides). It is strongly not recommended to run TandemTools on shorter TRs.


Submitting an interactive job

Allocate an interactive session and run the interactive job there.

[biowulf]$ sinteractive  --mem=10g  --cpus-per-task=4
salloc.exe: Granted job allocation 789523
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0135 are ready for job

[cn0135]$ cd /data/$USER/

[cn0135]$ module load tandemtools

[cn0135]$ cp -r /usr/local/apps/tandemtools/test_data .

[cn0135]$ cd test_data

[cn0135]$ $ -t $SLURM_CPUS_PER_TASK --nano simulated_reads.fasta simulated_polished.fa -o simulated_res

[cn0135]$ exit
salloc.exe: Job allocation 789523 has been revoked.

Note: this job allocates 10 GB of memory and automatically assign the number of cpus allocated to the variable $SLURM_CPUS_PER_TASK.

Submitting a single batch job

1. Create a script file (myscript) similar to the one below

#! /bin/bash
# myscript
set -e

module load tandemtools || exit 1
cd /data/$USER/test_data/ -t $SLURM_CPUS_PER_TASK --nano simulated_reads.fasta simulated_polished.fa -o simulated_res

2. Submit the script on biowulf:

[biowulf]$ sbatch --mem=10g --cpus-per-task=4 myscript

Using Swarm

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/$USER/cmdfile).

cd /data/$USER/dir1; -t $SLURM_CPUS_PER_TASK...
cd /data/$USER/dir2; -t $SLURM_CPUS_PER_TASK...
cd /data/$USER/dir3; -t $SLURM_CPUS_PER_TASK...
cd /data/$USER/dir20; -t $SLURM_CPUS_PER_TASK...

submit the swarm job:

$ swarm -f cmdfile --module tandemtools -g 10 -t 4

