Biowulf High Performance Computing at the NIH
transrate on Biowulf

Transrate is software for de-novo transcriptome assembly quality analysis. It examines your assembly in detail and compares it to experimental evidence such as the sequencing reads, reporting quality scores for contigs and assemblies. This allows you to choose between assemblers and parameters, filter out the bad contigs from an assembly, and help decide when to stop trying to improve the assembly.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --cpus-per-task=4
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$  tar xvzf /usr/local/apps/transrate/example_data.tar.gz

[user@cn3144 ~]$   cd example_data

[user@cn3144 ~]$ module load transrate

[user@cn3144 ~]$  transrate --assembly=transcripts.fa --threads=$SLURM_CPUS_PER_TASK
[ INFO] 2018-09-28 15:28:07 : Loading assembly: /data/user/transrate/example_data/transcripts.fa
[ INFO] 2018-09-28 15:28:07 : Analysing assembly: /data/user/transrate/example_data/transcripts.fa
[ INFO] 2018-09-28 15:28:07 : Results will be saved in /data/user/transrate/example_data/transrate_results/transcripts
[ INFO] 2018-09-28 15:28:07 : Calculating contig metrics...
[ INFO] 2018-09-28 15:28:07 : Contig metrics:
[ INFO] 2018-09-28 15:28:07 : -----------------------------------
[ INFO] 2018-09-28 15:28:07 : n seqs                           15
[ INFO] 2018-09-28 15:28:07 : smallest                        849
[ INFO] 2018-09-28 15:28:07 : largest                        2396
[ INFO] 2018-09-28 15:28:07 : n bases                       28562
[ INFO] 2018-09-28 15:28:07 : mean len                    1904.13
[ INFO] 2018-09-28 15:28:07 : n under 200                       0
[ INFO] 2018-09-28 15:28:07 : n over 1k                        14
[ INFO] 2018-09-28 15:28:07 : n over 10k                        0
[ INFO] 2018-09-28 15:28:07 : n with orf                       15
[ INFO] 2018-09-28 15:28:07 : mean orf percent              46.46
[ INFO] 2018-09-28 15:28:07 : n90                            1612
[ INFO] 2018-09-28 15:28:07 : n70                            1681
[ INFO] 2018-09-28 15:28:07 : n50                            2037
[ INFO] 2018-09-28 15:28:07 : n30                            2288
[ INFO] 2018-09-28 15:28:07 : n10                            2385
[ INFO] 2018-09-28 15:28:07 : gc                             0.53
[ INFO] 2018-09-28 15:28:07 : bases n                           0
[ INFO] 2018-09-28 15:28:07 : proportion n                    0.0
[ INFO] 2018-09-28 15:28:07 : Contig metrics done in 0 seconds
[ INFO] 2018-09-28 15:28:07 : No reads provided, skipping read diagnostics
[ INFO] 2018-09-28 15:28:07 : No reference provided, skipping comparative diagnostics
[ INFO] 2018-09-28 15:28:07 : Writing contig metrics for each contig to /data/user/transrate/example_data/transrate_results/transcripts/contigs.csv
[ INFO] 2018-09-28 15:28:07 : Writing analysis results to assemblies.csv

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. transrate.sh). For example:

#!/bin/bash
set -e
module load transrate
transrate --assembly transcripts.fa \
            --left left.fq \
            --right right.fq \
	    --threads $SLURM_CPUS_PER_TASK

Submit this job using the Slurm sbatch command.

sbatch [--mem=#] --cpus-per-task=8 transrate.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. transrate.swarm). You should make sure the outputs from each command are going into different directories, otherwise each of the commands will be overwriting the same files.

transrate --assembly=transcripts1.fa --output=transrate1 --threads=$SLURM_CPUS_PER_TASK
transrate --assembly=transcripts2.fa --output=transrate2 --threads=$SLURM_CPUS_PER_TASK
transrate --assembly=transcripts3.fa --output=transrate3 --threads=$SLURM_CPUS_PER_TASK
[...]

Submit this job using the swarm command.

swarm -f transrate.swarm [-g #] -t # --module transrate
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module transrate Loads the transrate module for each subjob in the swarm