High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
iSAAC on Helix/Biowulf

iSAAC is an ultrafast DNA sequence aligner (Isaac Genome Alignment Software) that takes advantage of high-memory hardware (>48 GB) and variant caller (Isaac Variant Caller)

iSAAC reference paper

iSAAC2 at github.

On Helix

iSAAC is a CPU, memory, and I/O intensive application that is better suited for the Biowulf cluster than for Helix.

Batch job on Biowulf

Create a batch input file (e.g. run.sh). The following example uses the demo data provided with iSAAC.

The I/O involved for a large alignment is significant. It is recommended that you use local scratch on the allocated node for temporary files, as in the example below.

#!/bin/bash
# submit with: sbatch --mem=50g --cpus-per-task=32 --gres=lscratch:50 run.sh

cd /data/$USER/iSAAC
module load iSAAC/02.16.03.09

isaac-sort-reference \
    -g  $ISAAC_HOME/share/iSAAC-02.16.03.09/data/examples/PhiX/iGenomes/PhiX/NCBI/1993-04-28/Sequence/Chromosomes/phix.fa \
    -o ./PhiX

isaac-align \
    -r ./PhiX/sorted-reference.xml -b $ISAAC_HOME/share/*/data/examples/PhiX/Fastq \
    -f fastq \
    --use-bases-mask y150,y150 \
    --variable-read-length yes -m10 \
    -j $SLURM_CPUS_PER_TASK \
    -t /lscratch/$SLURM_JOBID

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=32 --mem=50g --gres=lscratch:50 run.sh
Swarm of Jobs on Biowulf

Create a swarmfile (e.g. isaac.swarm). For example:

cd /data/$USER/dir1; isaac-align \
    -r ./PhiX/sorted-reference.xml \
    -b Fastq1 \
    -f fastq \
    --use-bases-mask y150,y150 \
    --variable-read-length yes -m10 \
    -j $SLURM_CPUS_PER_TASK \
    -t /lscratch/$SLURM_JOBID
cd /data/$USER/dir2; isaac-align \
    -r ./PhiX/sorted-reference.xml \
    -b Fastq2 \
    -f fastq \
    --use-bases-mask y150,y150 \
    --variable-read-length yes -m10 \
    -j $SLURM_CPUS_PER_TASK \
    -t /lscratch/$SLURM_JOBID    
cd /data/$USER/dir3; isaac-align \
    -r ./PhiX/sorted-reference.xml \
    -b Fastq3 \
    -f fastq \
    --use-bases-mask y150,y150 \
    --variable-read-length yes -m10 \
    -j $SLURM_CPUS_PER_TASK \
    -t /lscratch/$SLURM_JOBID    
[...]
    

Submit this job using the swarm command.

swarm -f isaac.swarm -g 20 -t 24 --gres=lscratch:20 
Interactive job on Biowulf

Sample session with the demo dataset that is provided with iSAAC. Allocate an interactive session with memory, cpus and local scratch (which will be used for temporary files).

[user@biowulf ~]$ sinteractive --mem=50g --cpus-per-task=32 --gres=lscratch:50
salloc.exe: Pending job allocation 36744333
salloc.exe: job 36744333 queued and waiting for resources
salloc.exe: job 36744333 has been allocated resources
salloc.exe: Granted job allocation 36744333
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn2493 are ready for job

[user@cn2493 ~] cd /data/$USER/testjob

[user@cn2493 testjob]$ isaac-sort-reference \
	-g  $ISAAC_HOME/share/iSAAC-02.16.03.09/data/examples/PhiX/iGenomes/PhiX/NCBI/1993-04-28/Sequence/Chromosomes/phix.fa \
	-o ./PhiX
make: Entering directory `/spin1/users/user/iSAAC/PhiX'
/usr/local/apps/iSAAC/02.16.03.09/bin/../share/iSAAC-02.16.03.09/makefiles/reference/../../../../share/iSAAC-02.16.03.09/makefiles/common/../../../../libexec/iSAAC-02.16.03.09/findNeighbors -r /spin1/users/user/iSAAC/PhiX/Temp/neighbor-positions-56.xml \
		--seed-length 56 \
		--neighborhood-distance 1 \
		--mask-width 0 \
		--mask 0 \
		--output-file /spin1/users/user/iSAAC/PhiX/Temp/neighbors-1-56.16bpb.gz.tmp && mv /spin1/users/user/iSAAC/PhiX/Temp/neighbors-1-56.16bpb.gz.tmp /spin1/users/user/iSAAC/PhiX/Temp/neighbors-1-56.16bpb.gz
[2017-03-31 09:15:57]	[biowulf.nih.gov]	[ers/user/iSAAC/PhiX/Temp/neighbors-1-56.16bpb.gz]	make Target:	/spin1/users/user/iSAAC/PhiX/Temp/neighbors-1-56.16bpb.gz
[2017-03-31 09:15:57]	[biowulf.nih.gov]	[ers/user/iSAAC/PhiX/Temp/neighbors-1-56.16bpb.gz]	make Reason:	/spin1/users/user/iSAAC/PhiX/Temp/neighbor-positions-56.xml Temp/.sentinel
[2017-03-31 09:15:57]	[biowulf.nih.gov]	[ers/user/iSAAC/PhiX/Temp/neighbors-1-56.16bpb.gz]	make Prereqs:	/spin1/users/user/iSAAC/PhiX/Temp/neighbor-positions-56.xml Temp/.sentinel
[2017-03-31 09:15:57]	[biowulf.nih.gov]	[ers/user/iSAAC/PhiX/Temp/neighbors-1-56.16bpb.gz]	make Cmd:	/usr/local/apps/iSAAC/02.16.03.09/bin/../share/iSAAC-02.16.03.09/m
[...]
[2017-03-31 09:16:50]	[biowulf.nih.gov]	[all]	make Target:	all
[2017-03-31 09:16:50]	[biowulf.nih.gov]	[all]	make Reason:	sorted-reference.xml
[2017-03-31 09:16:50]	[biowulf.nih.gov]	[all]	make Prereqs:	sorted-reference.xml
[2017-03-31 09:16:50]	[biowulf.nih.gov]	[all]	make Cmd:	[[ 2 == 0 ]] || 1>&2 echo -e "INFO:" "All done!"
[2017-03-31 09:16:50]	[biowulf.nih.gov]	[all]	INFO: All done!
make: Leaving directory `/spin1/users/user/iSAAC/PhiX'

[user@cn2493 testjob]$ isaac-align \
	-r ./PhiX/sorted-reference.xml \
	-b $ISAAC_HOME/share/*/data/examples/PhiX/Fastq \
	-f fastq \
	--use-bases-mask y150,y150 \
	--variable-read-length yes -m10 \
	-j $SLURM_CPUS_PER_TASK \
	-t /lscratch/$SLURM_JOBID
2017-03-31 09:18:18 	[2aaaac264940]	Forcing LC_ALL to C
2017-03-31 09:18:18 	[2aaaac264940]	Version: iSAAC-02.16.03.09
2017-03-31 09:18:18 	[2aaaac264940]	argc: 12 argv: isaac-align -r ./PhiX/sorted-reference.xml -b /usr/local/apps/iSAAC/02.16.03.09/share/iSAAC-02.16.03.09/data/examples/PhiX/Fastq -f fastq --use-bases-mask y150,y150 --variable-read-length yes -m10
2017-03-31 09:18:18 	[2aaaac264940]	FastqReader uncompressedBufferSize_=67108864
2017-03-31 09:18:18 	[2aaaac264940]	Opened  fastq stream on /usr/local/apps/iSAAC/02.16.03.09/share/iSAAC-02.16.03.09/data/examples/PhiX/Fastq/lane1_read1.fastq and base Q0 !
2017-03-31 09:18:18 	[2aaaac264940]	FastqReader uncompressedBufferSize_=67108864
[...]
2017-03-31 09:35:00 	[2aaaac264940]	Generating Build statistics
2017-03-31 09:35:00 	[2aaaac264940]	Generating Build statistics done
2017-03-31 09:35:00 	[2aaaac264940]	Generating the BAM files done
2017-03-31 09:35:00 	[2aaaac264940]	md5 checksum for /spin1/users/user/iSAAC/./Aligned/Projects/default/default/sorted.bam:341a8efa2f33a7f486dd5c0a4cb34c0d
2017-03-31 09:35:00 	[2aaaac264940]	Saving workflow state to "/spin1/users/user/iSAAC/./Temp/AlignerState.txt"
2017-03-31 09:35:00 	[2aaaac264940]	Saving workflow state done to "/spin1/users/user/iSAAC/./Temp/AlignerState.txt"
[user@cn2493 testjob]$ exit
exit
salloc.exe: Relinquishing job allocation 36744333

Documentation