TransDecoder on Biowulf

Quick Links

TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

TransDecoder identifies likely coding sequences based on the following criteria:

a minimum length open reading frame (ORF) is found in a transcript sequence
a log-likelihood score similar to what is computed by the GeneID software is > 0.
the above coding score is greatest when the ORF is scored in the 1st reading frame as compared to scores in the other 5 reading frames.
if a candidate ORF is found fully encapsulated by the coordinates of another candidate ORF, the longer one is reported. However, a single transcript can report multiple ORFs (allowing for operons, chimeras, etc).
optional the putative peptide has a match to a Pfam domain above the noise cutoff score.

Documentation

TransDecoder Main Site: TransDecoder at GitHub

Important Notes

Module Name: TransDecoder (see the modules page for more information)
Multithreaded/singlethreaded/MPI...
Environment variables set
- TRANSDECODER_HOME
- TRANSDECODER_EXAMPLES

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load TransDecoder
[user@cn3144 ~]$ cp $TRANSDECODER_EXAMPLES/simple_transcriptome_target/* .
[user@cn3144 ~]$ ./runMe.sh

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. TransDecoder.sh). For example:

#!/bin/bash
module load TransDecoder
cufflinks_gtf_genome_to_cdna_fasta.pl transcripts.gtf test.genome.fasta > transcripts.fasta

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] TransDecoder.sh

Swarm of Jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. TransDecoder.swarm). For example:

TransDecoder.LongOrfs -t target_transcripts1.fasta
TransDecoder.LongOrfs -t target_transcripts2.fasta
TransDecoder.LongOrfs -t target_transcripts3.fasta
TransDecoder.LongOrfs -t target_transcripts4.fasta

Submit this job using the swarm command.

swarm -f TransDecoder.swarm [-g #] [-t #] --module TransDecoder

where

`-g #`	Number of Gigabytes of memory required for each process (1 line in the swarm command file)
`-t #`	Number of threads/CPUs required for each process (1 line in the swarm command file).
`--module TransDecoder`	Loads the TransDecoder module for each subjob in the swarm