Biowulf High Performance Computing at the NIH
tetoolkit on Biowulf

A package for including transposable elements in differential enrichment analysis of sequencing datasets.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ module load tetoolkit
[user@cn3144]$ cp $TETOOLKIT_TEST_DATA/testdata_SE/* .
[user@cn3144]$ cp $TETOOLKIT_TEST_DATA/testdata_GTF/dm3* .
[user@cn3144]$ gunzip *.gz
[user@cn3144]$ TEtranscripts --sortByPos --mode multi \
  --TE dm3_rmsk_TE.gtf --GTF dm3_refGene.gtf \
  --project singleEnd_test -t test_data_SE_treatment.bam \
  -c test_data_SE_control.bam
...
[user@cn3144]$ ls -1
dm3_refGene.gtf
dm3_rmsk_TE.gtf
singleEnd_test.cntTable
singleEnd_test_DESeq.R
singleEnd_test_gene_TE_analysis.txt
singleEnd_test_sigdiff_gene_TE.txt
test_data_SE_control.bam
test_data_SE_treatment.bam
[user@cn3144]$ head singleEnd_test.cntTable
gene/TE test_data_SE_treatment.bam.T    test_data_SE_control.bam.C
"128up" 32      15
"14-3-3epsilon" 471     442
"14-3-3zeta"    449     382
"140up" 5       5
"18w"   15      11
"26-29-p"       34      43
"2mit"  0       0
"312"   2       3
"4EHP"  14      24

[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. tetoolkit.sh), which uses the input file 'tetoolkit.in'. For example:

#! /bin/bash
# this is tetranscripts.sh

module load tetoolkit/1.5.1 || exit 1
TEtranscripts --sortByPos --mode multi --verbose \
  --TE dm3_rmsk_TE.gtf --GTF dm3_refGene.gtf \
  --project singleEnd_test -t test_data_SE_treatment.bam \
  -c test_data_SE_control.bam 2> log

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] tetoolkit.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. tetoolkit.swarm). For example:

TEtranscripts --sortByPos --mode multi \
  --TE dm3_rmsk_TE.gtf --GTF dm3_refGene.gtf \
  --project singleEnd_test1 -t treatment1.bam -c control1.bam
TEtranscripts --sortByPos --mode multi \
  --TE dm3_rmsk_TE.gtf --GTF dm3_refGene.gtf \
  --project singleEnd_test2 -t treatment2.bam -c control2.bam

Submit this job using the swarm command.

swarm -f tetoolkit.swarm [-g #] [-t #] --module tetoolkit/1.5.1
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module tetoolkit Loads the tetoolkit module for each subjob in the swarm