Biowulf High Performance Computing at the NIH
trinotate on Biowulf

Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms. Trinotate makes use of a number of different well referenced methods for functional annotation including homology search to known sequence data (BLAST+/SwissProt), protein domain identification (HMMER/PFAM), protein signal peptide and transmembrane domain prediction (signalP/tmHMM), and leveraging various annotation databases (eggNOG/GO/Kegg databases). All functional annotation data derived from the analysis of transcripts is integrated into a SQLite database which allows fast efficient searching for terms with specific qualities related to a desired scientific hypothesis or a means to create a whole annotation report for a transcriptome.

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

The example below runs the trinotate runMe.sh script provided with the program. Executing the runMe.sh script will pull down the Trinotate sqlite boilerplate database, populate with the provided bioinformatics computes, and generate the final Trinotate annotation report.
[user@biowulf]$ sinteractive --mem=10g --gres=lscratch:10g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load trinotate

[user@cn3144 ~]$ cd /lscratch/$SLURM_JOBID

[user@cn3144 ~]$ cp -r $TRINOTATE_HOME/sample_data . 

[user@cn3144 ~]$ cd sample_data

[user@cn3144 ~]$ ./runMe.sh
edgeR_trans/
edgeR_trans/Trinity_trans.counts.matrix.heatshock_vs_plateau.edgeR.DE_results.samples
edgeR_trans/Trinity_trans.counts.matrix.diauxic_shift_vs_log_growth.diauxic_shift.vs.log_growth.EdgeR.Rscript
edgeR_trans/diffExpr.P0.1_C1.matrix
edgeR_trans/Trinity_trans.counts.matrix.heatshock_vs_log_growth.edgeR.DE_results.P0.1_C1.heatshock-UP.subset
edgeR_trans/clusters_fixed_P_60.heatmap.heatmap.pdf
edgeR_trans/Trinity_trans.counts.matrix.heatshock_vs_log_growth.edgeR.DE_results.MA_n_Volcano.pdf
edgeR_trans/diffExpr.P0.1_C1.matrix.RData
edgeR_trans/Trinity_trans.counts.matrix.log_growth_vs_plateau.edgeR.DE_results.MA_n_Volcano.pdf
edgeR_trans/Trinity_trans.counts.matrix.log_growth_vs_plateau.edgeR.DE_results.samples
[...]

###########################
Generating report table
###########################
#########################################
Extracting Gene Ontology Mappings Per Gene
#########################################
##########################
done. See annotation summary file: Trinotate_report.xls
##########################

[user@cn3144 ~]$ ls -rtl
total 498272
drwxr-x--- 2 user user      4096 Mar  8  2016 edgeR_genes
drwxr-x--- 3 user user     12288 Mar  8  2016 edgeR_trans
-rwxr-x--- 1 user user       755 Aug 22 15:53 cleanme.pl
-rwxr-x--- 1 user user      5678 Aug 22 15:53 runMe.sh
drwxr-x--- 2 user user      4096 Aug 22 15:53 data
-rw-r----- 1 user user  18445845 Aug 22 15:53 Trinotate_report.xls
-rw-r----- 1 user user   6479239 Aug 22 15:54 Trinotate_report.xls.gene_ontology
-rw-r----- 1 user user 478552064 Aug 22 15:54 myTrinotate.sqlite
-rw-r----- 1 user user       544 Aug 22 15:54 Trinotate_report_stats.taxonomy_counts
-rw-r----- 1 user user       177 Aug 22 15:54 Trinotate_report_stats.species_counts
-rw-r----- 1 user user       232 Aug 22 15:54 Trinotate_report_stats.eggnog_counts
-rw-r----- 1 user user       170 Aug 22 15:54 Trinotate_report_stats.eggnog_counts.funcats
-rw-r----- 1 user user    130768 Aug 22 15:54 Trinotate_report_stats.kegg.counts
-rw-r----- 1 user user        11 Aug 22 15:54 Trinotate_report_stats.pfam.counts
-rw-r----- 1 user user   6479239 Aug 22 15:54 Trinotate_report_stats.GO
-rw-r----- 1 user user     47526 Aug 22 15:54 Trinotate_report_stats.GO.slim
-rw-r----- 1 user user     16367 Aug 22 15:54 Trinotate_report_stats.cXp_summary.html

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Trinotate produces Excel and html files. The easiest way to view them is to use helixdrive to mount your Biowulf /home or /data area onto your desktop, then click on the file. For the test job above, since the output is in /lscratch/$SLURM_JOBID (temporary local disk on the node), you should copy the desired files back to your /data area before exiting the session.

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. trinotate.sh). For example:

#!/bin/bash
set -e
module load trinotate
$TRINOTATE_HOME/Trinotate Trinotate.sqlite init --gene_trans_map  --transcript_fasta  --transdecoder_pep 
etc

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=32 --mem=20g trinotate.sh
Note: these are suggested values for cpus-per-task and mem. Based on your initial runs, you may need to increase or decrease them.