Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms. Trinotate makes use of a number of different well referenced methods for functional annotation including homology search to known sequence data (BLAST+/SwissProt), protein domain identification (HMMER/PFAM), protein signal peptide and transmembrane domain prediction (signalP/tmHMM), and leveraging various annotation databases (eggNOG/GO/Kegg databases). All functional annotation data derived from the analysis of transcripts is integrated into a SQLite database which allows fast efficient searching for terms with specific qualities related to a desired scientific hypothesis or a means to create a whole annotation report for a transcriptome.
Users MUST allocate lscratch to run Trinotate. This is because its dependancy, signalp, requires a temporary directory that defaults to lscratch.
To use the Trinotate Webserver, make sure to set up ssh tunneling. See interactive session below.
The example below runs the trinotate runMe.Biowulf.sh script created specifically for Biowulf. Executing this script will initially extract the reference dataset from $TRINOTATE_DATA_TAR before running the computation, generating reports, and setting up data for Trinotate WebServer
Allocate an interactive session with lscratch and --tunnel option
Sample session (user input in bold):
[user@biowulf]$ sinteractive --mem=10g --gres=lscratch:100 --tunnel salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job Created 1 generic SSH tunnel(s) from this compute node to biowulf for your use at port numbers defined in the $PORTn ($PORT1, ...) environment variables. Please create a SSH tunnel from your workstation to these ports on biowulf. On Linux/MacOS, open a terminal and run: ssh -L 45000:localhost:45000 biowulf.nih.gov For Windows instructions, see https://hpc.nih.gov/docs/tunneling
Use the instructions above to setup ssh tunnel in a separate terminal. See ssh tunneling for more information
[user@cn3144 ~]$ module load trinotate [user@cn3144 ~]$ cd /lscratch/$SLURM_JOBID [user@cn3144 46116226]$ cp -r $TRINOTATE_TEST_DATA . [user@cn3144 46116226]$ cd test_data [user@cn3144 test_data]$ ./runMe.Biowulf.sh [user@cn3144 test_data]$ run_TrinotateWebserver.pl
Copy the URL provided by the last command above and paste it your browser. Once you are done examining your data, press Ctrl+C in the interactive session to terminate the Trinotate Webserver.
[user@cn3144 test_data]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Trinotate also produces spreadsheets and html files. The easiest way to view them is to use hpcdrive to mount your Biowulf /home or /data area onto your desktop, then click on the file. For the test job above, since the output is in /lscratch/$SLURM_JOBID (temporary local disk on the node), you should copy the desired files back to your /data area before exiting the session.
Create a batch input file (e.g. trinotate.sh). For example:
#!/bin/bash set -e module load trinotate Trinotate Trinotate.sqlite init --gene_trans_map--transcript_fasta --transdecoder_pep etc
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=4 --mem=20g --gres=lscratch:25 trinotate.shNote: these are suggested values for cpus-per-task and mem. Based on your initial runs, you may need to increase or decrease them.