Gromacs on Biowulf

Gromacs 2023.2 on Biowulf

Quick Links

Current Versions

Replica Exchange

on GPUs

Walltimes & Chaining jobs

Tips for Best Performance

Benchmarks

Documentation

GROMACS (www.gromacs.org) is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

Important Notes

Module Name: gromacs (see the modules page for more information)
MPI Parallelization. Please read the webpage Making efficient use of Biowulf's Multinode Partition before running large parallel jobs.

Current Versions

Gromacs can multi-thread as well as use MPI. For small jobs, e.g. 8 cpus on a single node, multi-threading works almost as well as MPI. For larger jobs, it is best to use MPI. (see the Benchmarks page) for details.

Considerations for Gromacs jobs on rhel8:

For all available versions, performance is improved when assigning a homogenous set of nodes for parallel jobs
Add the argument --constraint=x2695 to specify node type (applicable for all versions)

The following sub-tabs have examples for each version with their cooresponding dependency builds:

Example files available in: /usr/local/apps/gromacs/tests/ADH_bench_systems.tar.gz

2021.3
2022.4+plumed2.8.2
2022.4
2023.2
2024

Gromacs version 2021.3 is built with both CPU and GPU compatibility

Module Name: gromacs/2021.3

Getting Started

[user@biowulf]$ module load gromacs/2021.3
[+] Loading gcc  8.5.0  ...
[+] Loading openmpi/4.1.4/CUDA-11.4 gcc-8.5.0  ...
[+] Loading CUDA Toolkit  11.4.4  ...
[+] Loading cmake 3.23.0  ...
[+] Loading Gromacs 2021.3  ...
[+] Built with gcc 8.5 CUDA 11.4, OpenMPI 4.1.4

[user@biowulf]$ mkdir /data/$USER/gromacs && cd /data/$USER/gromacs
[user@biowulf]$ tar xvzf /usr/local/apps/gromacs/tests/ADH_bench_systems.tar.gz
[user@biowulf]$ cd ADH/adh_cubic

#create MD input file using gromacs preprocessor
[user@biowulf adh_cubic]$ gmx grompp -f pme_verlet.mdp -c conf.gro -p topol.top 
:-) GROMACS - gmx grompp, 2021.3 (-:
...
GROMACS:      gmx grompp, version 2021.3
Executable:   /usr/local/apps/gromacs/2021.3/bin/gmx
Data prefix:  /usr/local/apps/gromacs/2021.3
Working dir:  /gpfs/gsfs10/users/apptest4/gromacs/2021-ADH/adh_cubic
...

Sample MPI batch script for Gromacs 2021.3

#!/bin/bash

module load gromacs/2021.3

gmx_mpi mdrun -ntomp 1 -s topol.tpr

For jobs running on CPU, it is recommended to use homogenous nodes with --constraint=x2695. GPU is also available for version 2021.3, you can submit these jobs using the following:

sbatch --ntasks=# --ntasks-per-core=1 --nodes=1 run.2021.3  #ntasks < 16

sbatch -p multinode --ntasks=# --ntasks-per-core=1 --nodes=1 run.2021.3  #ntasks > 16

sbatch -p gpu --gres=gpu:a100:1 --ntasks=1 --ntasks-per-core=1 run.2021.3  # for 1 a100 GPU

where:

--partition=multinode	Submit to the multinode partition where all nodes are Infiniband-connected
--constraint=x2695	All nodes should be x2695's.
--ntasks #	the number of MPI processes you wish to run.
--ntasks-per-core=1	ensures that Gromacs will only run 1 MPI process per physical core (i.e will not use both hyperthreaded CPUs). This is recommended for parallel jobs.
-ntomp 1	uses only one OMP thread per MPI thread. This means that Gromacs will run using only MPI, which provides the best performance.
--time=168:00:00	max walltime=168 hrs (1 week). See the section on chaining jobs below.
--exclusive	Allocate the nodes exclusively to this job (recommended for parallel jobs)

Documentation

Gromacs documentation for 2021.3

Gromacs version 2022.4 is also available with plumed version 2.8.2

This allows the -plumed option to be used to specify input files

Module Name: gromacs/2022.4+plumed2.8.2

Getting Started

[user@biowulf]$  module load gromacs/2022.4+plumed2.8.2
[+] Loading gcc  8.5.0  ...
[+] Loading openmpi/4.1.4/CUDA-11.4 gcc-8.5.0  ...
[+] Loading CUDA Toolkit  11.4.4  ...
[+] Loading cmake 3.23.0  ...
[+] Loading Gromacs 2022.4+plumed2.8.2  ...
[+] Built with gcc 8.5, OpenMPI 4.1.4, Plumed 2.8.2

#plumed options

[user@biowulf]$ plumed --help
...
Usage: plumed [options] [command] [command options]
  plumed [command] -h|--help: to print help for a specific command
Options:
  [help|-h|--help]          : to print this help
  [--is-installed]          : fails if plumed is not installed
  [--has-mpi]               : fails if plumed is running without MPI
  [--has-dlopen]            : fails if plumed is compiled without dlopen
  [--load LIB]              : loads a shared object (typically a plugin library)
  [--standalone-executable] : tells plumed not to look for commands implemented as scripts
Commands:
  plumed completion : dump a function usable for programmable completion
  plumed driver : analyze trajectories with plumed
  plumed driver-float : analyze trajectories with plumed (single precision version)
  plumed gen_example : construct an example for the manual that users can interact with
  plumed gentemplate : print out a template input for a particular action
  plumed info : provide informations about plumed
  plumed kt : print out the value of kT at a particular temperature
  plumed manual : print out a description of the keywords for an action in html
  plumed pathtools : print out a description of the keywords for an action in html
  plumed pdbrenumber : Modify atom numbers in a PDB, possibly using hybrid-36 coding
  plumed pesmd : Langevin dynamics on PLUMED energy landscape
  plumed simplemd : run lj code
  plumed sum_hills : sum the hills with  plumed
  plumed config : inquire plumed about how it was configure
  plumed mklib : compile a .cpp file into a shared library
  plumed newcv : create a new collective variable from a template
  plumed partial_tempering : scale parameters in a gromacs topology to implement solute or partial tempering
  plumed patch : patch an MD engine
  plumed selector : create lists of serial atom numbers
  plumed vim2html : convert plumed input file to colored html using vim syntax

Downloading examples and generating input files

#downloading example files 
[user@biowulf]$ mkdir /data/$USER/gromacs && cd /data/$USER/gromacs
[user@biowulf]$ tar xvzf /usr/local/apps/gromacs/tests/ADH_bench_systems.tar.gz
[user@biowulf]$ cd ADH/adh_cubic

#create MD input file using gromacs preprocessor
[user@biowulf adh_cubic]$ gmx_mpi grompp -f pme_verlet.mdp -c conf.gro -p topol.top
:-) GROMACS - gmx grompp, 2022.4-plumed_2.8.2 (-:

Executable:   /usr/local/apps/gromacs/2022.4+plumed2.8.2/bin/gmx_mpi
Data prefix:  /usr/local/apps/gromacs/2022.4+plumed2.8.2
Working dir:  /gpfs/gsfs10/users/apptest4/ADH/adh_cubic
Command line:
  gmx_mpi grompp -f pme_verlet.mdp -c conf.gro -p topol.top

...
#now you have created the necessary .tpr file needed to run gromacs jobs

Sample MPI batch script for Gromacs 2022.4+plumed2.8.2

#!/bin/bash

module load gromacs/2022.4+plumed2.8.2

gmx_mpi mdrun -ntomp 1 -s topol.tpr

This version of Gromacs is available to run on CPUs. To submit these jobs you can use the following:

sbatch --ntasks=# --ntasks-per-core=1 --nodes=1 run.2022.4+plumed2.8.2  #ntasks < 16

sbatch -p multinode --ntasks=# --ntasks-per-core=1 --nodes=1 run.2022.4+plumed2.8.2  #ntasks > 16

where:

--partition=multinode	Submit to the multinode partition where all nodes are Infiniband-connected
--constraint=x2695	All nodes should be x2695's.
--ntasks #	the number of MPI processes you wish to run.
--ntasks-per-core=1	ensures that Gromacs will only run 1 MPI process per physical core (i.e will not use both hyperthreaded CPUs). This is recommended for parallel jobs.
-ntomp 1	uses only one OMP thread per MPI thread. This means that Gromacs will run using only MPI, which provides the best performance.
--time=168:00:00	max walltime=168 hrs (1 week). See the section on chaining jobs below.
--exclusive	Allocate the nodes exclusively to this job (recommended for parallel jobs)

Documentation

PLUMED website
PLUMED on github

Gromacs version 2022.4 is built with both CPU and GPU compatibility

Module Name: gromacs/2022.4

Getting Started

[user@biowulf]$ module load gromacs/2022.4
[+] Loading gcc  8.5.0  ...
[+] Loading openmpi/4.1.4/CUDA-11.4 gcc-8.5.0  ...
[+] Loading CUDA Toolkit  11.4.4  ...
[+] Loading cmake 3.23.0  ...
[+] Loading Gromacs 2022.4  ...
[+] Built with gcc 8.5 CUDA 11.4, OpenMPI 4.1.4

[user@biowulf]$ mkdir /data/$USER/gromacs && cd /data/$USER/gromacs
[user@biowulf]$ tar xvzf /usr/local/apps/gromacs/tests/ADH_bench_systems.tar.gz
[user@biowulf]$ cd ADH/adh_cubic

#create MD input file using gromacs preprocessor
[user@biowulf adh_cubic]$ gmx grompp -f pme_verlet.mdp -c conf.gro -p topol.top 
:-) GROMACS - gmx grompp, 2022.4 (-:
...
Executable:   /usr/local/apps/gromacs/2022.4/bin/gmx
Data prefix:  /usr/local/apps/gromacs/2022.4
Working dir:  /gpfs/gsfs10/users/apptest4/ADH/adh_cubic
Command line:
  gmx grompp -f pme_verlet.mdp -c conf.gro -p topol.top
...

Sample MPI batch script for Gromacs 2022.4

#!/bin/bash

module load gromacs/2022.4

gmx_mpi mdrun -ntomp 1 -s topol.tpr

For jobs running on CPU, it is recommended to use homogenous nodes with --constraint=x2695. GPU is also available for version 2022.4, you can submit these jobs using the following:

sbatch --ntasks=# --ntasks-per-core=1 --nodes=1 run.2022.4  #ntasks < 16

sbatch -p multinode --ntasks=# --ntasks-per-core=1 --nodes=1 run.2022.4  #ntasks > 16

sbatch -p gpu --gres=gpu:a100:1 --ntasks=1 --ntasks-per-core=1 run.2022.4  # for 1 a100 GPU

where:

--partition=multinode	Submit to the multinode partition where all nodes are Infiniband-connected
--constraint=x2695	All nodes should be x2695's.
--ntasks #	the number of MPI processes you wish to run.
--ntasks-per-core=1	ensures that Gromacs will only run 1 MPI process per physical core (i.e will not use both hyperthreaded CPUs). This is recommended for parallel jobs.
-ntomp 1	uses only one OMP thread per MPI thread. This means that Gromacs will run using only MPI, which provides the best performance.
--time=168:00:00	max walltime=168 hrs (1 week). See the section on chaining jobs below.
--exclusive	Allocate the nodes exclusively to this job (recommended for parallel jobs)

Documentation

Gromacs documentation for 2022.4

Gromacs version 2024 has three available verisons; two are version 2024.0- one compiled with Intel 2022 and one compiled with gcc

Gromacs version 2024.1 is also available with the new Intel 2024 compiler

Module Name: gromacs/2024-gcc11.3 (gcc)

Module Name: gromacs/2024-intel2022 (Intel)

Module Name: gromacs/2024.1-intel2024 (Intel, default)

Getting Started

[user@biowulf]$ module load gromacs
[+] Loading CUDA Toolkit  12.1.0  ...
[+] Loading Intel 2024.0.1.46  Compilers ...
[+] Loading openmpi/4.1.6/intel-2024.0.1.46  ...
[+] Loading Gromacs 2024.1-intel2024  ...
[+] Built with OpenMPI 4.1.6, gcc 11.3.0, Intel 2024.0.1.46
[user@biowulf]$ mkdir /data/$USER/gromacs && cd /data/$USER/gromacs
[user@biowulf]$ tar xvzf /usr/local/apps/gromacs/tests/ADH_bench_systems.tar.gz
[user@biowulf]$ cd ADH/adh_cubic

#create MD input file using gromacs preprocessor
[user@biowulf adh_cubic]$ gmx grompp -f pme_verlet.mdp -c conf.gro -p topol.top 
                :-) GROMACS - gmx grompp, 2024.1 (-:

Executable:   /usr/local/apps/gromacs/2024.1-intel2024.0/bin/gmx
Data prefix:  /usr/local/apps/gromacs/2024.1-intel2024.0
Working dir:  /vf/users/ashdownht/gromacs_all/g-copy/adh_cubic
Command line:
  gmx grompp -f pme_verlet.mdp -c conf.gro -p topol.top
...

Sample MPI batch script for Gromacs 2024.1

#!/bin/bash

module load gromacs/2024.1-intel2024

gmx_mpi mdrun -ntomp 1 -s topol.tpr

For jobs running on CPU, it is recommended to use homogenous nodes with --constraint=x2695. GPU is also available for version 2023.2, you can submit these jobs using the following:

sbatch --ntasks=# --ntasks-per-core=1 --nodes=1 run.2024.1  #ntasks < 16

sbatch -p multinode --ntasks=# --ntasks-per-core=1 --nodes=1 run.2024.1  #ntasks > 16

sbatch -p gpu --gres=gpu:a100:1 --ntasks=1 --ntasks-per-core=1 run.2024.1  # for 1 a100 GPU

where:

--partition=multinode	Submit to the multinode partition where all nodes are Infiniband-connected
--constraint=x2695	All nodes should be x2695's.
--ntasks #	the number of MPI processes you wish to run.
--ntasks-per-core=1	ensures that Gromacs will only run 1 MPI process per physical core (i.e will not use both hyperthreaded CPUs). This is recommended for parallel jobs.
-ntomp 1	uses only one OMP thread per MPI thread. This means that Gromacs will run using only MPI, which provides the best performance.
--time=168:00:00	max walltime=168 hrs (1 week). See the section on chaining jobs below.
--exclusive	Allocate the nodes exclusively to this job (recommended for parallel jobs)

Documentation

Gromacs documentation for 2023.2

Replica Exchange

Sample batch script for Gromacs 4.6.5 (thanks to Mingzhen Zhang)

#!/bin/bash

module load gromacs/2018

cd $SLURM_SUBMIT_DIR

mpirun -np $SLURM_NTASKS `which mdrun_mpi` -ntomp 1 -s cmd_.tpr -maxh 0.50 -resethway -noconfout -cpi state.cpt -noappend -multi 48 -replex 1000

Submit with:

sbatch --partition multinode --constraint=x2695 --job-name=MyJob --ntasks=64 --ntasks-per-core=1 --exclusive   myjobscript

On GPUs

Use 'freen' to see GPUs available.

biowulf% freen

.......Per-Node Resources......
Partition   FreeNds  FreeCPUs   FreeGPUs Cores CPUs GPUs Mem  Disk Features
-------------------------------------------------------------------------------------------------------
...
multinode   320/587  20168/32872         28    56        247g 400g cpu56,core28,g256,ssd400,x2695,ibfdr
multinode   15 /620  2254 /34720         28    56        247g 800g cpu56,core28,g256,ssd800,x2680,ibfdr
gpu (v100x)  3 /52   2262 /3744 43 /208  36    72   4    373g 1600g cpu72,core36,g384,ssd1600,x6140,ibhdr,gpuv100x
gpu (a100)   0 /33   890  /2112  0 /132  32    64   4    247g 3200g cpu64,core32,g256,ssd3200,e7543p,ibhdr200,gpua100
gpu (p100)   3 /46   776  /2576 42 /184  28    56   4    121g 650g cpu56,core28,g128,ssd650,x2680,ibfdr,gpup100
gpu (k80)    4 /19   224  /1064 16 /76   28    56   4    247g 400g cpu56,core28,g256,ssd400,x2695,ibfdr,gpuk80
gpu (k80)   21 /65   2070 /3640 138/260  28    56   4    247g 800g cpu56,core28,g256,ssd800,x2680,ibfdr,gpuk80
gpu (v100)   0 /7    298  /392   5 /28   28    56   4    121g 800g cpu56,core28,g128,ssd800,x2680,ibfdr,gpuv100

...

Chaining jobs

The max walltime on the multinode partition is 10 days. (type 'batchlim' to see the CPU and walltime limits on all partitions). Thus, jobs should be designed to run for a week or so, save a checkpoint file, and submit a new job starting from that checkpoint.

A reasonable strategy would be to set up a job to run for a week or less by setting the number of steps appropriately, and then, at the end of the job, have it resubmit itself to continue the simulation. Below is a sample batch script:

#!/bin/bash
# this script is called Run.ib

module load gromacs/2018.3

cd /path/to/my/dir

mpirun -np $SLURM_NTASKS `which mdrun_mpi` -ntomp 1 -s ion_channel.tpr -maxh 0.50 -resethway -noconfout -nsteps 1000

# use tpbconv to create a new topol.tpr file with an increased number of steps
tpbconv -s topol.tpr -extend 500 -o topol2.tpr

#move the newly created topol.tpr into place
mv topol.tpr topol.tpr.prev; mv topol2.tpr topol.tpr

#resubmit this script
sbatch --partition multinode --constraint=x2680 --job-name=gmx  --ntasks=# --ntasks-per-core=1  --time=168:00:00 --exclusive  Run.ib

More information at Extending Simulations on the Gromacs site.

If a Gromacs job is terminated unexpectedly (for example, the walltime limit was hit before the mdrun completed), it is simple to restart. The state.cpt file contains all the information necessary to continue the simulation. Use the '-cpi' and '-append' options to mdrun, which will append to existing energy, trajectory and log files. For example:

mpirun -n $np `which mdrun_mpi` -s topol.tpr -cpi state.cpt -append

More information at Doing Restarts on the Gromacs website.

Tips for Best Performance

Make sure you request homogenous resources. The 'freen' command will show several kinds of nodes in the multinode partition. Pick one (depending on CPU speed or availability), and submit to only that type of node by using the 'constraint' flag. e.g.

#SBATCH --constraint=x2695

Running on a mix of node types will effectively mean running on the slowest type of node.

The bottleneck for MD MPI-based programs is the inter-node communication, so you should submit to as few nodes as possible, utilizing all the cores on the node. For example, suppose you want to run 100 ntasks. 'freen' shows nodes with 16 or 28 cores. If you run on 28-core nodes, you can utilize 4 full nodes with:

#SBATCH --ntasks=112
#SBATCH --ntasks-per-core=1
#SBATCH --nodes=4
#SBATCH --exclusive
#SBATCH --constraint=nodetype

Check the job utilization with 'jobload' -- ideally you should see all the allocated cores fully utilized. Since only one process is running per core, you should see 50% utilization on each of the allocated nodes.

Example of a well set up job

biowulf% jobload -j 11111111
           JOBID            TIME            NODES  CPUS  THREADS   LOAD       MEMORY
                     Elapsed / Wall               Alloc   Active           Used /     Alloc
      11111111    0-04:05:52 /  2-02:00:00 cn1135    56       28   50%     0.8 /    4.0 GB
                  0-04:05:52 /  2-02:00:00 cn2170    56       28   50%     0.8 /    4.0 GB
                  0-04:05:52 /  2-02:00:00 cn2171    56       28   50%     0.8 /    4.0 GB
                  0-04:05:52 /  2-02:00:00 cn2172    56       28   50%     0.8 /    4.0 GB

Example of a poorly set up job

biowulf% jobload -j 11111111
           JOBID            TIME            NODES  CPUS  THREADS   LOAD       MEMORY
                     Elapsed / Wall               Alloc   Active           Used /     Alloc
      11111111    0-04:05:52 /  2-02:00:00 cn1135     4        4   100%     0.3 /    4.0 GB
                  0-04:05:52 /  2-02:00:00 cn2170    16       13    81%     0.8 /    4.0 GB
                  0-04:05:52 /  2-02:00:00 cn2171    16       13    81%     0.8 /    4.0 GB
                  0-04:05:52 /  2-02:00:00 cn2172    48       13    27%     0.8 /    4.0 GB
                  0-04:05:52 /  2-02:00:00 cn3024    44       13    30%     0.8 /    4.0 GB
                  0-04:05:52 /  2-02:00:00 cn3025    44       13    30%     0.8 /    4.0 GB

Benchmarks

see the benchmarks page

Documentation

Gromacs website
mdrun documentation for v5.0.4