Biowulf High Performance Computing at the NIH
raxml on Biowulf

RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It can also be used for post- analyses of sets of phylogenetic trees, analyses of alignments and, evolutionary placement of short reads. It has originally been derived from fastDNAml which in turn was derived from Joe Felsentein's dnaml which is part of the PHYLIP package. [RAxML website]

There are several different RAxML executables:


raxmlHPC sequential version built with SSE3. Intended for small to medium datasets and for initial experiments to determine appropriate search parameters.
raxmlHPC-PTHREADS Can run multiple threads on multiple cores of a single node. Works well for longer alignments..
raxmlHPC-MPI Can run multiple MPI processes on multiple cores of multiple nodes. Intended for executing really large production runs (i.e. 100 or 1,000 bootstraps). It has been designed to do multiple inferences or rapid/standard BS (bootstrap) searches in parallel! For all remaining options, the usage of this type of coarse-grained parallelism does not make much sense!

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load raxml

[user@cn3144 ~]$ raxmlHPC -m BINGAMMA -p 12345 -s binary.phy -n T1 
IMPORTANT WARNING: Sequences t2 and t3 are exactly identical
IMPORTANT WARNING: Sequences t2 and t4 are exactly identical
IMPORTANT WARNING
Found 2 sequences that are exactly identical to other sequences in the alignment.
Normally they should be excluded from the analysis.

Just in case you might need it, an alignment file with 
sequence duplicates removed is printed to file binary.phy.reduced

[...]
Starting final GAMMA-based thorough Optimization on tree 0 likelihood -119.520001 .... 
Final GAMMA-based Score of best tree -119.520001

Program execution info written to /spin1/users/susanc/raxml/RAxML_info.T1
Best-scoring ML tree written to: /spin1/users/susanc/raxml/RAxML_bestTree.T1
Overall execution time: 0.496108 secs or 0.000138 hours or 0.000006 days

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. raxml.sh). For example:

#!/bin/bash
set -e
module load raxml
raxmlHPC -m BINGAMMA -p 12345 -s binary.phy -n T3  -T $SLURM_CPUS_PER_TASK
The '-T $SLURM_CPUS_PER_TASK' flag specifies the number of threads to run. This will automatically be the same as the number of CPUs you allocate using the sbatch command with --cpus-per-task, as in the example below.

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] raxml.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. raxml.swarm). For example:

raxmlHPC -m BINGAMMA -p 12345 -s file1.phy -n T3  
raxmlHPC -m BINGAMMA -p 12345 -s file2.phy -n T3  
raxmlHPC -m BINGAMMA -p 12345 -s file3.phy -n T3  
[...]     

Submit this job using the swarm command.

swarm -f raxml.swarm [-g #] --module raxml
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
--module raxml Loads the raxml module for each subjob in the swarm
If you are using threads (the 'raxmlHPC-PTHREADS' executable and -T $SLURM_CPUS_PER_TASK), be sure to specify the number of threads to swarm with:
$ swarm -f cmdfile -t # --module raxml

MPI RAxML batch job

An MPI job can run in parallel across multiple nodes. It can also run on multiple CPUs of a single node, similar to the threaded version. Set up a batch script along the following lines:

#!/bin/bash
#PBS -N raxml

cd /data/susanc/raxml

module load raxml

mpirun -np $SLURM_NTASKS raxmlHPC-MPI  -m BINGAMMA -s binary.phy -p 12345 -n TEST -N 24
Submit the job with:
sbatch --ntasks=8 [--ntasks-per-core=1] jobscript
where
--ntasks=8 tells slurm you need to run 8 MPI processes
--ntasks-per-core=1 runs one MPI process per physical core, i.e ignores hyperthreading. This is often best for parallel jobs. Test your job with and without this parameter to check if your job benefits from hyperthreading or not.
--np $SLURM_NTASKS tells mpirun to run $SLURM_NTASKS processes, which is set to 8 via the sbatch command line. This approach will ensure that the number of MPI processes corresponds to the number of allocated CPUs