Gromacs Benchmarks on Biowulf

Please read Making Effective Use of Biowulf's multinode partition before running multinode Gromacs jobs.

Gromacs overview of parallelization and acceleration schemes: Getting good performance from mdrun

For information about Gromacs performance, see the following papers:

  1. Páll, et al. (2015) Proc. of EASC 2015 LNCS, 8759 3-27. (DOI)
  2. Kutzner, et al. (2015) J. Comput. Chem, 36 1990-2008 (DOI)
  3. Abraham, et al. (2015) SoftwareX 1-2 19-25 (DOI)

[Gromacs 2018.3 benchmarks]

Gromacs 2022.4: ADH Cubic Benchmark on CPU

February 2023
Gromacs 2022.4 was built with gcc 9.2.0, CUDA 11.0, and OpenMPI 4.1.3. The ADH benchmark suite is available on the Gromacs website.

All runs were performed on the Biowulf Phase2 nodes, which are 28 x 2.3 GHz (Intel E5-2695v3), hyperthreading enabled, connected via 56 Gb/s FDR Infiniband (1.11:1).

ntomp = 1 blue line 1 thread per MPI task, 1 MPI task per physical core. No hyperthreading
ntomp = 2-noht orange line 2 threads per MPI task, no hyperthreading. This uses twice as many cores as the options above, but with threads per core = 1.
ntomp = 2-ht grey line 2 threads per MPI task, hyperthreading enabled. This effectively uses the same number of cores as ntomp=1, but uses both hyperthreaded CPUs on each core.
# MPI tasks
ntomp = 1
ns/day
ntomp = 2-noht
ns/day
ntomp = 2-ht
ns/day
1 0.7450.8660.867
2 1.4481.6691.672
4 2.8283.2443.244
8 5.3726.1326.121
169.86110.6310.909
32 16.2817.80317.666
64 28.16431.11730.872
128 43.78544.72844.651
256 67.00161.48261.008
512 87.60478.88378.601


For this benchmark, it is recommended that for MPI tasks < 128 to use ntomp = 1 hyperthreading disabled.

Larger numbers of MPI tasks should use ntomp = 2, all MPI tasks run most effectively assigning ntasks-per-core = 1.

Gromacs 2022.4: ADH Cubic Benchmark on GPU

[Jan 2023] The ADH benchmark suite is available on the Gromacs website. The tests below used the ADH_cubic benchmark..

For running gromacs on GPU: Running mdrun with GPUs.

k80's and v100's are Biowulf phase3 nodes Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz.

v100x's are Biowulf phase5 nodes Intel Gold 6140 @ 2.30GHz.

p100's are Biowulf phase4 nodes Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz.

a100s's are Biowulf phase6 nodes AMD EPYC 7543P 32-Core Processor.

For running GPU jobs, please use the freen command to determine which GPU paritions are available. For running these jobs, an appopriate submission command would be:
sbatch --partition=gpu \
--gres=gpu:xxx:1 --ntasks=2 \
--ntasks-per-core=1 --exclusive jobscript

For this benchmark, there is a significant advantage to running Gromacs a100 nodes over k80 nodes. There is typically no advantage to running on more than a single GPU device.

Single-GPU jobs were run on a single GPU with ntasks=2, n-tasks-per-core=1, and ntomp=2.

These results may be different for larger molecular systems, so please run your own benchmarks to verify that your jobs benefit from the GPUs (and send us any interesting results!)