High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
RAxML on Biowulf & Helix

RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It can also be used for post- analyses of sets of phylogenetic trees, analyses of alignments and, evolutionary placement of short reads. It has originally been derived from fastDNAml which in turn was derived from Joe Felsentein's dnaml which is part of the PHYLIP package. [RAxML website]

To add the RAxML executables to your path, you should use the command:

module load raxml
There are several different RAxML executables:

raxmlHPC sequential version built with SSE3. Intended for small to medium datasets and for initial experiments to determine appropriate search parameters.
raxmlHPC-PTHREADS Can run multiple threads on multiple cores of a single node. Works well for longer alignments..
raxmlHPC-MPI Can run multiple MPI processes on multiple cores of multiple nodes. Intended for executing really large production runs (i.e. 100 or 1,000 bootstraps). It has been designed to do multiple inferences or rapid/standard BS (bootstrap) searches in parallel! For all remaining options, the usage of this type of coarse-grained parallelism does not make much sense!
Important: read the section in the user manual on how many threads/cores to use. Rough rule of thumb: 1 thread/core per 500 DNA site patterns.

The test datasets for RAxML are available in /usr/local/apps/raxml/test-data.

Running RAxML on Helix

Sample Session

Serial batch job on Biowulf

Sample batch script:

# This file is raxml.bat

cd /data/user/mydir
module load raxml
raxmlHPC -m BINGAMMA -p 12345 -s binary.phy -n T3  -T $SLURM_CPUS_PER_TASK

Submit with:

sbatch  raxml.bat

Threaded batch job on Biowulf

Create a script file. Here is a sample batch script:

# This file is raxml.bat

cd /data/user/mydir
module load raxml
raxmlHPC-PTHREADS -m BINGAMMA -p 12345 -s binary.phy -n T3  -T $SLURM_CPUS_PER_TASK

Submit the script using the 'sbatch' command on Biowulf. You can specify the type of node required, in the usual way. The '-T $SLURM_CPUS_PER_TASK' flag specifies the number of threads to run. This will automatically be the same as the number of CPUs you allocate using the sbatch command below:

[user@biowulf]$ sbatch --cpus-per-task=4 myjobscript

Swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

raxmlHPC -m BINGAMMA -p 12345 -s file1.phy -n T3  
raxmlHPC -m BINGAMMA -p 12345 -s file2.phy -n T3  
raxmlHPC -m BINGAMMA -p 12345 -s file3.phy -n T3  

Submit this job with

$ swarm -f cmdfile --module raxml

If you are using threads (the 'raxmlHPC-PTHREADS' executable and -T $SLURM_CPUS_PER_TASK), be sure to specify the number of threads to swarm with:

$ swarm -f cmdfile -t # --module raxml

By default, swarm allocates 4 GB of memory to each process. If more is needed, add the '-g #' flag to swarm to specify the amount of memory required by a single process (one line in the command file above)

For more information regarding running swarm, see swarm.html

MPI RAxML batch job

Set up a batch script along the following lines:

#PBS -N raxml

cd /data/susanc/raxml

module load raxml

mpirun -np $SLURM_NTASKS raxmlHPC-MPI  -m BINGAMMA -s binary.phy -p 12345 -n TEST -N 24
Submit the job with:
sbatch --ntasks=8 [--ntasks-per-core=1] jobscript
--ntasks=8 tells slurm you need to run 8 MPI processes
--ntasks-per-core=1 runs one MPI process per physical core, i.e ignores hyperthreading. This is often best for parallel jobs. Test your job with and without this parameter to check if your job benefits from hyperthreading or not.
--np $SLURM_NTASKS tells mpirun to run $SLURM_NTASKS processes, which is set to 8 via the sbatch command line. This approach will ensure that the number of MPI processes is never more or less than the number of allocated CPUs



RaxML Manual (PDF)