Novocraft on Biowulf

NovoSplice beta is now available for testing.

Novoalign is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties whilst performing at the same or better speed than aligners that are limited to two mismatches and no insertions or deletions.

Novoalign indexes for some common genome assemblies such as hg18 and hg19 are available in /fdb/novoalign. If there are other genomes you want indexed, please email staff@hpc.nih.gov

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ sinteractive --cpus-per-task=4 --mem=10g
[user@cn3144 ~]$ novoalign -c $SLURM_CPUS_PER_TASK -d celegans -f sim1.fastq sim1r.fastq -o SAM > out1.sam
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. novo.sh). For example:

#!/bin/bash
set -e

module load novocraft

# cd to the appropriate directory
cd /data/$USER/mydir

# generate an index file named 'celegans' for the sequence file elegans.dna.fa
novoindex celegans elegans.dna.fa

# align the reads in file s_1_sequence.txt against the indexed genome of C.Elegans.
novoalign -c $SLURM_CPUS_PER_TASK -f s_1_sequence.txt -d celegans -o SAM > out.sam

Submit this job using the Slurm sbatch command.

$ sbatch --cpus-per-task=4 --mem=10g myscript

The number assigned to '--cpus-per-task' will be passed to the $SLURM_CPUS_PER_TASK in the script automatically. User can adjust memory requirement based on needs using --mem as in the example.

 
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Make sure to request lscratch space and copy reference files to lscratch space for each line of commands in the swarm file.

Create a swarmfile (e.g. novo.swarm). For example:

cd /data/$USER/novo1; cp -r /Path/To/RefFiles /lscratch/$SLURM_JOBID; novoalign -c $SLURM_CPUS_PER_TASK -d /lsratch/$SLURM_JOBID/Path/To/RefFiles/RefFile.nix -f sim1.fastq sim1r.fastq -o SAM > out1.sam
cd /data/$USER/novo2; cp -r /Path/To/RefFiles /lscratch/$SLURM_JOBID; novoalign -c $SLURM_CPUS_PER_TASK -d /lsratch/$SLURM_JOBID/Path/To/RefFiles/RefFile.nix -f sim1.fastq sim1r.fastq -o SAM > out1.sam
cd /data/$USER/novo3; cp -r /Path/To/RefFiles /lscratch/$SLURM_JOBID; novoalign -c $SLURM_CPUS_PER_TASK -d /lsratch/$SLURM_JOBID/Path/To/RefFiles/RefFile.nix -f sim1.fastq sim1r.fastq -o SAM > out1.sam

Submit this job using the swarm command.

swarm --gres=lscratch:50 -f novo.swarm -g 12 -t 4 --module novocraft

For more information regarding using lscratch in swarm, see https://hpc.nih.gov/apps/swarm.html#lscratch

where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module Loads the module for each subjob in the swarm
Running a NovoalignMPI or NovoalignCSMPI batch job on Biowulf

1. Create a batch script along the lines of the one below:

#!/bin/bash 
# the file name is novoMPIScript

# load the latest version of novoalignMPI
module load novocraft
cd /data/$USER/mydir
Nodelist=`make-novo-nodelist`
mpiexec -envall -host $Nodelist -np $SLURM_NTASKS novoalignMPI -c $SLURM_CPUS_PER_TASK -d /fdb/novoalign/chr_all_mm10.nix -f infile1.fq infile2.fq > outputfile

3. submit job on the biowulf headnode:

biowulf $ sbatch --partition=multinode --nodes=2 --ntasks=4 --cpus-per-task=28 --constraint=x2695 mpi.script 

Novoindex memory usage

Novoindex can use a lot of memory, so it is worthwhile estimating the memory usage before submitting the job, to prevent nodes with overloaded memory. (Thanks to Colin Hercus of Novocraft for this information).

The memory used for a indexed genome is
N/2 + 4(k+1) + 4N/s
where N is the length of the reference genome, k the index k-mer length and s the indexing step size. Note that the second term must be converted to the same units as the first and third.

For example, for a 6GB reference sequence, with default values k=15 and s=2, the index size would be
6G/2 + 416 +4*6G/2 = 3G + 4G + 12G = 20G

It might be better to set the options as -k=15 -s=3 and then have index of ~ 15G
or -k=14 -s=3 for an index size of 13G.

Changing k&s can have an effect on run time so it might be worth testing with a few values to find the best memory/run time trade off.

Novoalign & NovoalignMPI benchmark

In general, parallel jobs should scale to at least 70% efficiency for the sake of other Biowulf users. One user using twice the resources to squeeze out 10% more performance may be keeping other users from working at all. The efficiency of a parallel job can be calculated as follows, where e is efficiency, n is the number of processors running the simulation, t1 is the performance time running on one node and tn is the performance time running on n nodes.

       e = t1/(n*tn)  

For example if a job benchmarks at 20 minutes when running on one node and at 5 minutes when running on 8 nodes, we can figure the efficiency of scaling like this:

       e= 20/(8*5) = 50% 

50% is way below the 70% guideline; this job does not scale well out to 8 nodes and would therefore use too many resources to justify any benefit. Indeed it may be the case that adding nodes would slow the actual over-all performance (wall-time) of the job. This type of benchmarking should be done on all systems you intend to simulate before long-term runs.

To find the most appropriate number of nodes for a specific type of job, it is essential to run one's own benchmarks.

The following novoalignMPI benchmark was done using paired reads, each 27 gb, against hg19 index file, using either x2650 or x2695 nodes.

Since a master process is considered a task, it could take up to one node depend on how many cpus-per-task is set. For example, if 3 tasks and 56 cpus-per-task is requested, then 1 node which run the master process will pratically do nothing and only 2 nodes have novoalign jobs running.

The following table has been colaborated so that only nodes running novoalign jobs are counted.

No. of Nodes
x2650
x2695
Efficiency
1
42m
100%
2
21m
100%
4
12m
87.5%
8
7m
75%
16
4.5m
58%
1
26m
100%
2
13m
100%
4
8.5m
76%
8
5.5m
59%
16
4.5m
36%

Based on the benchmark above, user should not use more than 4 nodes for x2695 and 8 nodes for x2650.

 

The following novoalign (non-MPI) benchmark was done using paired reads, each 27 gb, against hg19 index file, using either x2650 or x2695.

CPUS x2650 x2695 Efficiency
1 643m   100
2 343m   94
4 224m   72
8 147m   55
16 79m   51
32 46m   44
1   600m 100
2   491m 61
4   283m 53
8   148m 51
16   81m 46
32   49m 38