trinotate on Biowulf

Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms. Trinotate makes use of a number of different well referenced methods for functional annotation including homology search to known sequence data (BLAST+/SwissProt), protein domain identification (HMMER/PFAM), protein signal peptide and transmembrane domain prediction (signalP/tmHMM), and leveraging various annotation databases (eggNOG/GO/Kegg databases). All functional annotation data derived from the analysis of transcripts is integrated into a SQLite database which allows fast efficient searching for terms with specific qualities related to a desired scientific hypothesis or a means to create a whole annotation report for a transcriptome.

Users MUST allocate lscratch to run Trinotate. This is because its dependancy, signalp, requires a temporary directory that defaults to lscratch.

March 2023: The documentation below is for Trinotate v3.2.0. The latest version, 4.0.0, has many differences, primarily because it is run as a singularity container. It is available on Biowulf for testing.

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

The example below runs the trinotate runMe.sh script provided with the program. Executing the runMe.sh script will pull down the Trinotate sqlite boilerplate database, populate with the provided bioinformatics computes, and generate the final Trinotate annotation report.
[user@biowulf]$ sinteractive --mem=10g --gres=lscratch:10g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load trinotate

[user@cn3144 ~]$ cd /lscratch/$SLURM_JOBID

[user@cn3144 ~]$ cp -r $TRINOTATE_HOME/sample_data . 

[user@cn3144 ~]$ cd sample_data

[user@cn3144 ~]$ ./runMe.sh
edgeR_trans/
edgeR_trans/Trinity_trans.counts.matrix.heatshock_vs_plateau.edgeR.DE_results.samples
edgeR_trans/Trinity_trans.counts.matrix.diauxic_shift_vs_log_growth.diauxic_shift.vs.log_growth.EdgeR.Rscript
edgeR_trans/diffExpr.P0.1_C1.matrix
edgeR_trans/Trinity_trans.counts.matrix.heatshock_vs_log_growth.edgeR.DE_results.P0.1_C1.heatshock-UP.subset
edgeR_trans/clusters_fixed_P_60.heatmap.heatmap.pdf
edgeR_trans/Trinity_trans.counts.matrix.heatshock_vs_log_growth.edgeR.DE_results.MA_n_Volcano.pdf
edgeR_trans/diffExpr.P0.1_C1.matrix.RData
edgeR_trans/Trinity_trans.counts.matrix.log_growth_vs_plateau.edgeR.DE_results.MA_n_Volcano.pdf
edgeR_trans/Trinity_trans.counts.matrix.log_growth_vs_plateau.edgeR.DE_results.samples
[...]

###########################
Generating report table
###########################
#########################################
Extracting Gene Ontology Mappings Per Gene
#########################################
##########################
done. See annotation summary file: Trinotate_report.xls
##########################

[user@cn3144 ~]$ ls -rtl
total 498272
drwxr-x--- 2 user user      4096 Mar  8  2016 edgeR_genes
drwxr-x--- 3 user user     12288 Mar  8  2016 edgeR_trans
-rwxr-x--- 1 user user       755 Aug 22 15:53 cleanme.pl
-rwxr-x--- 1 user user      5678 Aug 22 15:53 runMe.sh
drwxr-x--- 2 user user      4096 Aug 22 15:53 data
-rw-r----- 1 user user  18445845 Aug 22 15:53 Trinotate_report.xls
-rw-r----- 1 user user   6479239 Aug 22 15:54 Trinotate_report.xls.gene_ontology
-rw-r----- 1 user user 478552064 Aug 22 15:54 myTrinotate.sqlite
-rw-r----- 1 user user       544 Aug 22 15:54 Trinotate_report_stats.taxonomy_counts
-rw-r----- 1 user user       177 Aug 22 15:54 Trinotate_report_stats.species_counts
-rw-r----- 1 user user       232 Aug 22 15:54 Trinotate_report_stats.eggnog_counts
-rw-r----- 1 user user       170 Aug 22 15:54 Trinotate_report_stats.eggnog_counts.funcats
-rw-r----- 1 user user    130768 Aug 22 15:54 Trinotate_report_stats.kegg.counts
-rw-r----- 1 user user        11 Aug 22 15:54 Trinotate_report_stats.pfam.counts
-rw-r----- 1 user user   6479239 Aug 22 15:54 Trinotate_report_stats.GO
-rw-r----- 1 user user     47526 Aug 22 15:54 Trinotate_report_stats.GO.slim
-rw-r----- 1 user user     16367 Aug 22 15:54 Trinotate_report_stats.cXp_summary.html

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Trinotate produces Excel and html files. The easiest way to view them is to use hpcdrive to mount your Biowulf /home or /data area onto your desktop, then click on the file. For the test job above, since the output is in /lscratch/$SLURM_JOBID (temporary local disk on the node), you should copy the desired files back to your /data area before exiting the session.

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. trinotate.sh). For example:

#!/bin/bash
set -e
module load trinotate
$TRINOTATE_HOME/Trinotate Trinotate.sqlite init --gene_trans_map  --transcript_fasta  --transdecoder_pep 
etc

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=32 --mem=20g --gres=lscratch:20 trinotate.sh
Note: these are suggested values for cpus-per-task and mem. Based on your initial runs, you may need to increase or decrease them.

signalp, rnammer, and tmhmm

As of February 2020, certain dependencies of Trinotate are now available as separate modules. These include signalp v4.1, rnammer v1.2 and tmhmm v2.0c. You can load them as follows:

module load signalp/4.1
module load rnammer/1.2
module load tmhmm/2.0c