High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Trimmomatic on Biowulf & Helix

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line. Trimmomatic was developed at the Usadel lab in Aachen, Germany.

The current trimming steps are:

It works with FASTQ (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used), either uncompressed or gzipp'ed FASTQ. Use of gzip format is determined based on the .gz extension.

For single-ended data, one input and one output file are specified, plus the processing steps. For paired-end data, two input files are specified, and 4 output files, 2 for the 'paired' output where both reads survived the processing, and 2 for corresponding 'unpaired' output where a read survived, but the partner read did not.

Use the modules commands to set up trimmomatic, as in the example below. By loading the module, you will set up an alias called 'trimmomatic' which is equivalent to 'java -classpath /usr/local/apps/trimmomatic/Trimmomatic-0.25/trimmomatic-0.25.jar'. The module will also set a environment variables called 'TRIMMOJAR' and 'TRIMMOMATIC_JAR' which point to the location of the trimmomatic java file. The variable 'TRIMMOMATIC_JARPATH' points to the directory in which the trimmomatic jar file is located.

Fasta files of adapter sequences are included with trimmomatic and can be found at /usr/local/apps/trimmomatic/Trimmomatic-<version>/adapters

On Helix

Sample session:

helix% module avail trimmomatic

----------------- /usr/local/lmod/modulefiles------------------
  trimmomatic/0.33
  
helix% module load trimmomatic

helix%  module list
Currently Loaded Modulefiles:
  1) trimmomatic/0.33
 
helix%  java -jar $TRIMMOJAR PE -phred33 \
    input_forward.fq.gz input_reverse.fq.gz \
    output_forward_paired.fq.gz output_forward_unpaired.fq.gz \
    output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz \
    ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Running a Trimmomatic batch job

Set up a batch script similar to the one below:

#!/bin/bash
# --- this script is called trim.bat -----

ml trimmomatic || exit 1
java -Djava.io.tmpdir=. -jar $TRIMMOJAR PE -phred33 -threads $SLURM_CPUS_PER_TASK \
    SRR292678_1.fastq.gz SRR292678_2.fastq.gz \
    output_forward_paired.fq.gz output_forward_unpaired.fq.gz \
    output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz \
    ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 \
    SLIDINGWINDOW:4:15 MINLEN:36

Submit this job with:

sbatch --mem=4G trim.bat
This will submit the job to 1 core (2 CPUs) and 4 GB of memory. If you need more than 4 GB of memory, submit with the --mem flag, as in the example below.

sbatch   --mem=6g     trim.bat

Running a swarm of trimmomatic jobs

If you have a large number of trimmomatic jobs to be run, swarm is a convenient way to do so. Create a swarm command file called, say, trim.swarm, similar to the one below:

java -jar $TRIMMOJAR PE -threads 8 -phred33 input1a input1b [...]
java -jar $TRIMMOJAR PE -threads 8 -phred33 input2a input2b [...]
java -jar $TRIMMOJAR PE -threads 8 -phred33 input3a input3b [...]
[...etc....]

Submit this swarm with:

swarm -f trim.swarm --module trimmomatic

Running an interactive Trimmomatic job

First allocate an interactive node as in the example below.

biowulf% sinteractive 
salloc.exe: Granted job allocation 142543
[user@cn0043 ~]$

[user@cn0043 ~]$ module load trimmomatic

[user@cn0043 ~]$ module list
Currently Loaded Modulefiles:
  1) trimmomatic/0.25
  

[user@cn0043 ~]$java -Xmx5g -jar $TRIMMOJAR \
    s_1_1_sequence.txt.gz s_1_2_sequence.txt.gz \
    lane1_forward_paired.fq.gz lane1_forward_unpaired.fq.gz \
    lane1_reverse_paired.fq.gz lane1_reverse_unpaired.fq.gz \
    ILLUMINACLIP:TruSeq2-PE.fa:2:40:15 \
    LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36


[user@cn0043 ~]$ exit
salloc.exe: Relinquishing job allocation 142543

The sample command above requests 5g of memory (larger than the default 1g set by the trimmomatic alias), and uses the $TRIMMOJAR environment variable.

Documentation

Trimmomatic website