High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
sailfish on Biowulf & Helix

Description

Sailfish quantifies the expression of a given set of transcripts using NGS reads. It is run in two stages: (1) The indexing step is run once per set of transcripts (2) The quantification step is run once for each sample.

There may be multiple versions of sailfish available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail sailfish 

To select a module use

module load sailfish/[version]

where [version] is the version of choice.

sailfish is a multithreaded application. Make sure to match the number of cpus requested with the number of threads (-p).

Environment variables set

References

Documentation

On Helix

Sailfish should not be run on helix.

Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is sailfish.batch

module load sailfish || exit 1

# get transcriptome from the example directory
zcat /usr/local/apps/sailfish/TEST_DATA/gencode.vM9.transcripts.fa.gz \
    > gencode.vM9.transcripts.fa

# index the transcripts
sailfish index -t gencode.vM9.transcripts.fa -o gencode.vM9.idx \
    -p $SLURM_CPUS_PER_TASK

# quantify the transcripts
sailfish quant -i gencode.vM9.idx -l U \
    -r <(zcat /usr/local/apps/sailfish/TEST_DATA/ENCFF528MAS.fastq.gz) \
    -o quant -p $SLURM_CPUS_PER_TASK

Submit to the queue with sbatch:

biowulf$ sbatch --cpus-per-task 8 --mem 8g sailfish.batch

Note that the indexing step only has to be carried out once per transcript set.

Swarm of jobs on Biowulf

Create a swarm command file similar to the following example using an existing index:

# this file is sailfish.swarm
sailfish quant -i gencode.vM9.idx -l U -r <(zcat sample1.fq.gz) \
    -o quant_sample1 -p $SLURM_CPUS_PER_TASK
sailfish quant -i gencode.vM9.idx -l U -r <(zcat sample2.fq.gz) \
    -o quant_sample2 -p $SLURM_CPUS_PER_TASK
sailfish quant -i gencode.vM9.idx -l U -r <(zcat sample3.fq.gz) \
    -o quant_sample3 -p $SLURM_CPUS_PER_TASK

And submit to the queue with swarm

biowulf$ swarm -f sailfish.swarm --module sailfish -g 8 -t 8
Interactive job on Biowulf

Allocate an interactive session with sinteractive and use like this: above

biowulf$ sinteractive 
node$ module load sailfish
node$ zcat /usr/local/apps/sailfish/TEST_DATA/gencode.vM9.transcripts.fa.gz \
    > gencode.vM9.transcripts.fa
node$ sailfish index -t gencode.vM9.transcripts.fa -o gencode.vM9.idx \
    -p $SLURM_CPUS_PER_TASK
node$ sailfish quant -i gencode.vM9.idx -l U \
    -r <(zcat /usr/local/apps/sailfish/TEST_DATA/ENCFF528MAS.fastq.gz) \
    -o quant -p $SLURM_CPUS_PER_TASK
node$ exit
biowulf$