High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Novocraft on Biowulf & Helix

Description

Novoalign is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties whilst performing at the same or better speed than aligners that are limited to two mismatches and no insertions or deletions.

Novoalign indexes for some common genome assemblies such as hg18 and hg19 are available in /fdb/novoalign. If there are other genomes you want indexed, please email staff@hpc.nih.gov

Running on Helix

$ cd /data/$USER/dir
$ module load novocraft
$ novoalign -c 4 -d /fdb/novoalign/chr_all_hg19.nbx -f read_1.fastq.gz read_2.fastq.gz -i 200,50

Running a single batch job on Biowulf

See Benchmarks

1. Create a batch script along the following lines:

#!/bin/bash 

module load novocraft

# cd to the appropriate directory
cd /data/$USER/mydir

# generate an index file named 'celegans' for the sequence file elegans.dna.fa
novoindex celegans elegans.dna.fa

# align the reads in file s_1_sequence.txt against the indexed genome of C.Elegans.
novoalign -c $SLURM_CPUS_PER_TASK -f s_1_sequence.txt -d celegans -o SAM > out.sam

2. on the biowulf login node, submit the job:

$ sbatch --cpus-per-task=4 --mem=10g myscript

The number assigned to '--cpus-per-task' will be passed to the $SLURM_CPUS_PER_TASK in the script automatically. User can adjust memory requirement based on needs using --mem as in the example.

Running a swarm of batch jobs on Biowulf

1. Create a swarm file along the following lines:

cd /data/$USER/novo1; novoalign -c $SLURM_CPUS_PER_TASK -d celegans -f sim1.fastq sim1r.fastq -o SAM > out1.sam
cd /data/$USER/novo2; novoalign -c $SLURM_CPUS_PER_TASK -d celegans -f sim2.fastq sim2r.fastq -o SAM > out2.sam
cd /data/$USER/novo3; novoalign -c $SLURM_CPUS_PER_TASK -d celegans -f sim3.fastq sim3r.fastq -o SAM > out3.sam
[....]

Submit this swarm with:

$ swarm -t 4 -g 12 -f swarmfile --module novocraft

For more information regarding running swarm, see swarm.html

Running a NovoalignMPI or NovoalignCSMPI batch job on Biowulf

1. Create a batch script along the lines of the one below:

#!/bin/bash 
# the file name is novoMPIScript

# load the latest version of novoalignMPI
module load novocraft
cd /data/$USER/mydir
Nodelist=`make-novo-nodelist`
mpiexec -envall -host $Nodelist -np $SLURM_NTASKS novoalignMPI -c $SLURM_CPUS_PER_TASK -d /fdb/novoalign/chr_all_mm10.nix -f infile1.fq infile2.fq > outputfile

3. submit job on the biowulf headnode:

biowulf $ sbatch --partition=multinode --nodes=2 --ntasks=4 --cpus-per-task=28 --constraint=x2695 mpi.script 

Running an interactive job on Biowulf

It may be useful for debugging purposes to run novoalign jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive --cpus-per-task=4 --mem=10g  
salloc.exe: Granted job allocation 16535

pxxx$ module load novocraft
pxxx$ cd /data/$USER/dir
pxxx$ novoalign -c $SLURM_CPUS_PER_TASK -d celegans -f sim1.fastq sim1r.fastq -o SAM > out1.sam
[...etc...]

pxxx$ exit
exit

biowulf$

Make sure to exit the job once you have finished your run.

Novoindex memory usage

Novoindex can use a lot of memory, so it is worthwhile estimating the memory usage before submitting the job, to prevent nodes with overloaded memory. (Thanks to Colin Hercus of Novocraft for this information).

The memory used for a indexed genome is
N/2 + 4(k+1) + 4N/s
where N is the length of the reference genome, k the index k-mer length and s the indexing step size. Note that the second term must be converted to the same units as the first and third.

For example, for a 6GB reference sequence, with default values k=15 and s=2, the index size would be
6G/2 + 416 +4*6G/2 = 3G + 4G + 12G = 20G

It might be better to set the options as -k=15 -s=3 and then have index of ~ 15G
or -k=14 -s=3 for an index size of 13G.

Changing k&s can have an effect on run time so it might be worth testing with a few values to find the best memory/run time trade off.

Novoalign & NovoalignMPI benchmark

In general, parallel jobs should scale to at least 70% efficiency for the sake of other Biowulf users. One user using twice the resources to squeeze out 10% more performance may be keeping other users from working at all. The efficiency of a parallel job can be calculated as follows, where e is efficiency, n is the number of processors running the simulation, t1 is the performance time running on one node and tn is the performance time running on n nodes.

       e = t1/(n*tn)  

For example if a job benchmarks at 20 minutes when running on one node and at 5 minutes when running on 8 nodes, we can figure the efficiency of scaling like this:

       e= 20/(8*5) = 50% 

50% is way below the 70% guideline; this job does not scale well out to 8 nodes and would therefore use too many resources to justify any benefit. Indeed it may be the case that adding nodes would slow the actual over-all performance (wall-time) of the job. This type of benchmarking should be done on all systems you intend to simulate before long-term runs.

To find the most appropriate number of nodes for a specific type of job, it is essential to run one's own benchmarks.

The following novoalignMPI benchmark was done using paired reads, each 27 gb, against hg19 index file, using either x2650 or x2695 nodes.

Since a master process is considered a task, it could take up to one node depend on how many cpus-per-task is set. For example, if 3 tasks and 56 cpus-per-task is requested, then 1 node which run the master process will pratically do nothing and only 2 nodes have novoalign jobs running.

The following table has been colaborated so that only nodes running novoalign jobs are counted.

No. of Nodes
x2650
x2695
Efficiency
1
42m
100%
2
21m
100%
4
12m
87.5%
8
7m
75%
16
4.5m
58%
1
26m
100%
2
13m
100%
4
8.5m
76%
8
5.5m
59%
16
4.5m
36%

Based on the benchmark above, user should not use more than 4 nodes for x2695 and 8 nodes for x2650.

 

The following novoalign (non-MPI) benchmark was done using paired reads, each 27 gb, against hg19 index file, using either x2650 or x2695.

CPUS x2650 x2695 Efficiency
1 643m   100
2 343m   94
4 224m   72
8 147m   55
16 79m   51
32 46m   44
1   600m 100
2   491m 61
4   283m 53
8   148m 51
16   81m 46
32   49m 38

 

Documentation

Novocraft homepage, see tab for documentation