NovoSplice beta is now available for testing.
Novoalign is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties whilst performing at the same or better speed than aligners that are limited to two mismatches and no insertions or deletions.
Novoalign indexes for some common genome assemblies such as hg18 and hg19 are available in /fdb/novoalign. If there are other genomes you want indexed, please email staff@hpc.nih.gov
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ sinteractive --cpus-per-task=4 --mem=10g [user@cn3144 ~]$ novoalign -c $SLURM_CPUS_PER_TASK -d celegans -f sim1.fastq sim1r.fastq -o SAM > out1.sam [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. novo.sh). For example:
#!/bin/bash set -e module load novocraft # cd to the appropriate directory cd /data/$USER/mydir # generate an index file named 'celegans' for the sequence file elegans.dna.fa novoindex celegans elegans.dna.fa # align the reads in file s_1_sequence.txt against the indexed genome of C.Elegans. novoalign -c $SLURM_CPUS_PER_TASK -f s_1_sequence.txt -d celegans -o SAM > out.sam
Submit this job using the Slurm sbatch command.
The number assigned to '--cpus-per-task' will be passed to the $SLURM_CPUS_PER_TASK in the script automatically. User can adjust memory requirement based on needs using --mem as in the example.
Make sure to request lscratch space and copy reference files to lscratch space for each line of commands in the swarm file.
Create a swarmfile (e.g. novo.swarm). For example:
cd /data/$USER/novo1; cp -r /Path/To/RefFiles /lscratch/$SLURM_JOBID; novoalign -c $SLURM_CPUS_PER_TASK -d /lsratch/$SLURM_JOBID/Path/To/RefFiles/RefFile.nix -f sim1.fastq sim1r.fastq -o SAM > out1.sam cd /data/$USER/novo2; cp -r /Path/To/RefFiles /lscratch/$SLURM_JOBID; novoalign -c $SLURM_CPUS_PER_TASK -d /lsratch/$SLURM_JOBID/Path/To/RefFiles/RefFile.nix -f sim1.fastq sim1r.fastq -o SAM > out1.sam cd /data/$USER/novo3; cp -r /Path/To/RefFiles /lscratch/$SLURM_JOBID; novoalign -c $SLURM_CPUS_PER_TASK -d /lsratch/$SLURM_JOBID/Path/To/RefFiles/RefFile.nix -f sim1.fastq sim1r.fastq -o SAM > out1.sam
Submit this job using the swarm command.
swarm --gres=lscratch:50 -f novo.swarm -g 12 -t 4 --module novocraft
For more information regarding using lscratch in swarm, see https://hpc.nih.gov/apps/swarm.html#lscratch
where-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module | Loads the module for each subjob in the swarm |
1. Create a batch script along the lines of the one below:
#!/bin/bash # the file name is novoMPIScript # load the latest version of novoalignMPI module load novocraft cd /data/$USER/mydir Nodelist=`make-novo-nodelist` mpiexec -envall -host $Nodelist -np $SLURM_NTASKS novoalignMPI -c $SLURM_CPUS_PER_TASK -d /fdb/novoalign/chr_all_mm10.nix -f infile1.fq infile2.fq > outputfile
3. submit job on the biowulf headnode:
biowulf $ sbatch --partition=multinode --nodes=2 --ntasks=4 --cpus-per-task=28 --constraint=x2695 mpi.script
Novoindex can use a lot of memory, so it is worthwhile estimating the memory usage before submitting the job, to prevent nodes with overloaded memory. (Thanks to Colin Hercus of Novocraft for this information).
The memory used for a indexed genome is
N/2 + 4(k+1) + 4N/s
where N is the length of the reference genome, k the index k-mer length and s the indexing step size. Note that the second term must be converted to the same units as the first and third.
For example, for a 6GB reference sequence, with default values k=15 and s=2, the index size would be
6G/2 + 416 +4*6G/2 = 3G + 4G + 12G = 20G
It might be better to set the options as -k=15 -s=3 and then have index of ~ 15G
or -k=14 -s=3 for an index size of 13G.
Changing k&s can have an effect on run time so it might be worth testing with a few values to find the best memory/run time trade off.
In general, parallel jobs should scale to at least 70% efficiency for the sake of other Biowulf users. One user using twice the resources to squeeze out 10% more performance may be keeping other users from working at all. The efficiency of a parallel job can be calculated as follows, where e is efficiency, n is the number of processors running the simulation, t1 is the performance time running on one node and tn is the performance time running on n nodes.
e = t1/(n*tn)
For example if a job benchmarks at 20 minutes when running on one node and at 5 minutes when running on 8 nodes, we can figure the efficiency of scaling like this:
e= 20/(8*5) = 50%
50% is way below the 70% guideline; this job does not scale well out to 8 nodes and would therefore use too many resources to justify any benefit. Indeed it may be the case that adding nodes would slow the actual over-all performance (wall-time) of the job. This type of benchmarking should be done on all systems you intend to simulate before long-term runs.
To find the most appropriate number of nodes for a specific type of job, it is essential to run one's own benchmarks.
The following novoalignMPI benchmark was done using paired reads, each 27 gb, against hg19 index file, using either x2650 or x2695 nodes.
Since a master process is considered a task, it could take up to one node depend on how many cpus-per-task is set. For example, if 3 tasks and 56 cpus-per-task is requested, then 1 node which run the master process will pratically do nothing and only 2 nodes have novoalign jobs running.
The following table has been colaborated so that only nodes running novoalign jobs are counted.
No. of Nodes |
x2650 |
x2695 |
Efficiency |
1 |
42m |
100% |
|
2 |
21m |
100% |
|
4 |
12m |
87.5% |
|
8 |
7m |
75% |
|
16 |
4.5m |
58% |
|
1 |
26m |
100% |
|
2 |
13m |
100% |
|
4 |
8.5m |
76% |
|
8 |
5.5m |
59% |
|
16 |
4.5m |
36% |
Based on the benchmark above, user should not use more than 4 nodes for x2695 and 8 nodes for x2650.
The following novoalign (non-MPI) benchmark was done using paired reads, each 27 gb, against hg19 index file, using either x2650 or x2695.
CPUS | x2650 | x2695 | Efficiency |
---|---|---|---|
1 | 643m | 100 | |
2 | 343m | 94 | |
4 | 224m | 72 | |
8 | 147m | 55 | |
16 | 79m | 51 | |
32 | 46m | 44 | |
1 | 600m | 100 | |
2 | 491m | 61 | |
4 | 283m | 53 | |
8 | 148m | 51 | |
16 | 81m | 46 | |
32 | 49m | 38 |