sailfish on Biowulf

Sailfish quantifies the expression of a given set of transcripts using NGS reads. It is run in two stages: (1) The indexing step is run once per set of transcripts (2) The quantification step is run once for each sample.


Interactive job
[user@biowulf]$ sinteractive --mem=8g --cpus-per-task=4 --gres=lscratch:20
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ module load sailfish
[user@cn3144]$ zcat $SAILFISH_TEST_DATA/gencode.vM9.transcripts.fa.gz > M9.fa
[user@cn3144]$ sailfish index -t M9.fa -o M9.idx -p $SLURM_CPUS_PER_TASK
[user@cn3144]$ cp $SAILFISH_TEST_DATA/ENCFF138LJO.fastq.gz .
[user@cn3144]$ sailfish quant -i M9.idx -r <(zcat ENCFF138LJO.fastq.gz) --libType U \
                      -o quant -p $SLURM_CPUS_PER_TASK

[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226

Batch job
Create a batch input file (e.g., which uses the input file ''. For example:

#! /bin/bash

module load sailfish/0.10.0 || exit 1
cd /lscratch/$SLURM_JOB_ID || exit 1

# get transcriptome from the example directory
# This is usually done once - not for each job. Only included
# here to show all steps involved in sailfish quantitation.
zcat $SAILFISH_TEST_DATA/gencode.vM9.transcripts.fa.gz \
    > gencode.vM9.transcripts.fa

# index the transcripts
sailfish index -t gencode.vM9.transcripts.fa -o gencode.vM9.idx \

# quantify the transcripts
sailfish quant -i gencode.vM9.idx -l U \
    -r <(zcat ENCFF138LJO.fastq.gz) \
    -o quant -p $SLURM_CPUS_PER_TASK
cp -r quant $wd

sbatch --cpus-per-task=8 --mem=8g --gres=lscratch:16
Swarm of Jobs
Create a swarmfile (e.g. sailfish.swarm). For example:

sailfish quant -i gencode.vM9.idx -l U -r <(zcat sample1.fq.gz) \
    -o quant_sample1 -p $SLURM_CPUS_PER_TASK
sailfish quant -i gencode.vM9.idx -l U -r <(zcat sample2.fq.gz) \
    -o quant_sample2 -p $SLURM_CPUS_PER_TASK
sailfish quant -i gencode.vM9.idx -l U -r <(zcat sample3.fq.gz) \
    -o quant_sample3 -p $SLURM_CPUS_PER_TASK

swarm -f sailfish.swarm -g 8 -t 8 --module sailfish/0.10.0
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module sailfish Loads the sailfish module for each subjob in the swarm