Gromacs 2023.2 on Biowulf

GROMACS (www.gromacs.org) is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

Important Notes

Current Versions

Gromacs can multi-thread as well as use MPI. For small jobs, e.g. 8 cpus on a single node, multi-threading works almost as well as MPI. For larger jobs, it is best to use MPI. (see the Benchmarks page) for details.

Considerations for Gromacs jobs on rhel8:

The following sub-tabs have examples for each version with their cooresponding dependency builds:

Gromacs version 2021.3 is built with both CPU and GPU compatibility

  • Module Name: gromacs/2021.3

    Getting Started
    [user@biowulf]$ module load gromacs/2021.3
    [+] Loading gcc  8.5.0  ...
    [+] Loading openmpi/4.1.4/CUDA-11.4 gcc-8.5.0  ...
    [+] Loading CUDA Toolkit  11.4.4  ...
    [+] Loading cmake 3.23.0  ...
    [+] Loading Gromacs 2021.3  ...
    [+] Built with gcc 8.5 CUDA 11.4, OpenMPI 4.1.4
    
    [user@biowulf]$ mkdir /data/$USER/gromacs && cd /data/$USER/gromacs
    [user@biowulf]$ tar xvzf /usr/local/apps/gromacs/tests/ADH_bench_systems.tar.gz
    [user@biowulf]$ cd ADH/adh_cubic
    
    #create MD input file using gromacs preprocessor
    [user@biowulf adh_cubic]$ gmx grompp -f pme_verlet.mdp -c conf.gro -p topol.top 
    :-) GROMACS - gmx grompp, 2021.3 (-:
    ...
    GROMACS:      gmx grompp, version 2021.3
    Executable:   /usr/local/apps/gromacs/2021.3/bin/gmx
    Data prefix:  /usr/local/apps/gromacs/2021.3
    Working dir:  /gpfs/gsfs10/users/apptest4/gromacs/2021-ADH/adh_cubic
    ...
    
    
    Sample MPI batch script for Gromacs 2021.3
    #!/bin/bash
    
    module load gromacs/2021.3
    
    gmx_mpi mdrun -ntomp 1 -s topol.tpr
    
    
    For jobs running on CPU, it is recommended to use homogenous nodes with --constraint=x2695. GPU is also available for version 2021.3, you can submit these jobs using the following:
    sbatch --ntasks=# --ntasks-per-core=1 --nodes=1 run.2021.3  #ntasks < 16
    
    sbatch -p multinode --ntasks=# --ntasks-per-core=1 --nodes=1 run.2021.3  #ntasks > 16
    
    sbatch -p gpu --gres=gpu:a100:1 --ntasks=1 --ntasks-per-core=1 run.2021.3  # for 1 a100 GPU
    
    
    where:
    --partition=multinode Submit to the multinode partition where all nodes are Infiniband-connected
    --constraint=x2695 All nodes should be x2695's.
    --ntasks # the number of MPI processes you wish to run.
    --ntasks-per-core=1 ensures that Gromacs will only run 1 MPI process per physical core (i.e will not use both hyperthreaded CPUs). This is recommended for parallel jobs.
    -ntomp 1 uses only one OMP thread per MPI thread. This means that Gromacs will run using only MPI, which provides the best performance.
    --time=168:00:00 max walltime=168 hrs (1 week). See the section on chaining jobs below.
    --exclusive Allocate the nodes exclusively to this job (recommended for parallel jobs)

    Documentation

    Gromacs documentation for 2021.3

  • Gromacs version 2022.4 is also available with plumed version 2.8.2

    This allows the -plumed option to be used to specify input files

  • Module Name: gromacs/2022.4+plumed2.8.2

    Getting Started
    [user@biowulf]$  module load gromacs/2022.4+plumed2.8.2
    [+] Loading gcc  8.5.0  ...
    [+] Loading openmpi/4.1.4/CUDA-11.4 gcc-8.5.0  ...
    [+] Loading CUDA Toolkit  11.4.4  ...
    [+] Loading cmake 3.23.0  ...
    [+] Loading Gromacs 2022.4+plumed2.8.2  ...
    [+] Built with gcc 8.5, OpenMPI 4.1.4, Plumed 2.8.2
    
    #plumed options
    
    [user@biowulf]$ plumed --help
    ...
    Usage: plumed [options] [command] [command options]
      plumed [command] -h|--help: to print help for a specific command
    Options:
      [help|-h|--help]          : to print this help
      [--is-installed]          : fails if plumed is not installed
      [--has-mpi]               : fails if plumed is running without MPI
      [--has-dlopen]            : fails if plumed is compiled without dlopen
      [--load LIB]              : loads a shared object (typically a plugin library)
      [--standalone-executable] : tells plumed not to look for commands implemented as scripts
    Commands:
      plumed completion : dump a function usable for programmable completion
      plumed driver : analyze trajectories with plumed
      plumed driver-float : analyze trajectories with plumed (single precision version)
      plumed gen_example : construct an example for the manual that users can interact with
      plumed gentemplate : print out a template input for a particular action
      plumed info : provide informations about plumed
      plumed kt : print out the value of kT at a particular temperature
      plumed manual : print out a description of the keywords for an action in html
      plumed pathtools : print out a description of the keywords for an action in html
      plumed pdbrenumber : Modify atom numbers in a PDB, possibly using hybrid-36 coding
      plumed pesmd : Langevin dynamics on PLUMED energy landscape
      plumed simplemd : run lj code
      plumed sum_hills : sum the hills with  plumed
      plumed config : inquire plumed about how it was configure
      plumed mklib : compile a .cpp file into a shared library
      plumed newcv : create a new collective variable from a template
      plumed partial_tempering : scale parameters in a gromacs topology to implement solute or partial tempering
      plumed patch : patch an MD engine
      plumed selector : create lists of serial atom numbers
      plumed vim2html : convert plumed input file to colored html using vim syntax
    
    
    Downloading examples and generating input files
    #downloading example files 
    [user@biowulf]$ mkdir /data/$USER/gromacs && cd /data/$USER/gromacs
    [user@biowulf]$ tar xvzf /usr/local/apps/gromacs/tests/ADH_bench_systems.tar.gz
    [user@biowulf]$ cd ADH/adh_cubic
    
    #create MD input file using gromacs preprocessor
    [user@biowulf adh_cubic]$ gmx_mpi grompp -f pme_verlet.mdp -c conf.gro -p topol.top
    :-) GROMACS - gmx grompp, 2022.4-plumed_2.8.2 (-:
    
    Executable:   /usr/local/apps/gromacs/2022.4+plumed2.8.2/bin/gmx_mpi
    Data prefix:  /usr/local/apps/gromacs/2022.4+plumed2.8.2
    Working dir:  /gpfs/gsfs10/users/apptest4/ADH/adh_cubic
    Command line:
      gmx_mpi grompp -f pme_verlet.mdp -c conf.gro -p topol.top
    
    ...
    #now you have created the necessary .tpr file needed to run gromacs jobs
    
    
    Sample MPI batch script for Gromacs 2022.4+plumed2.8.2
    #!/bin/bash
    
    module load gromacs/2022.4+plumed2.8.2
    
    gmx_mpi mdrun -ntomp 1 -s topol.tpr
    
    
    This version of Gromacs is available to run on CPUs. To submit these jobs you can use the following:
    sbatch --ntasks=# --ntasks-per-core=1 --nodes=1 run.2022.4+plumed2.8.2  #ntasks < 16
    
    sbatch -p multinode --ntasks=# --ntasks-per-core=1 --nodes=1 run.2022.4+plumed2.8.2  #ntasks > 16
    
    
    where:
    --partition=multinode Submit to the multinode partition where all nodes are Infiniband-connected
    --constraint=x2695 All nodes should be x2695's.
    --ntasks # the number of MPI processes you wish to run.
    --ntasks-per-core=1 ensures that Gromacs will only run 1 MPI process per physical core (i.e will not use both hyperthreaded CPUs). This is recommended for parallel jobs.
    -ntomp 1 uses only one OMP thread per MPI thread. This means that Gromacs will run using only MPI, which provides the best performance.
    --time=168:00:00 max walltime=168 hrs (1 week). See the section on chaining jobs below.
    --exclusive Allocate the nodes exclusively to this job (recommended for parallel jobs)

    Documentation

    PLUMED website
    PLUMED on github

  • Gromacs version 2022.4 is built with both CPU and GPU compatibility

  • Module Name: gromacs/2022.4

    Getting Started
    [user@biowulf]$ module load gromacs/2022.4
    [+] Loading gcc  8.5.0  ...
    [+] Loading openmpi/4.1.4/CUDA-11.4 gcc-8.5.0  ...
    [+] Loading CUDA Toolkit  11.4.4  ...
    [+] Loading cmake 3.23.0  ...
    [+] Loading Gromacs 2022.4  ...
    [+] Built with gcc 8.5 CUDA 11.4, OpenMPI 4.1.4
    
    [user@biowulf]$ mkdir /data/$USER/gromacs && cd /data/$USER/gromacs
    [user@biowulf]$ tar xvzf /usr/local/apps/gromacs/tests/ADH_bench_systems.tar.gz
    [user@biowulf]$ cd ADH/adh_cubic
    
    #create MD input file using gromacs preprocessor
    [user@biowulf adh_cubic]$ gmx grompp -f pme_verlet.mdp -c conf.gro -p topol.top 
    :-) GROMACS - gmx grompp, 2022.4 (-:
    ...
    Executable:   /usr/local/apps/gromacs/2022.4/bin/gmx
    Data prefix:  /usr/local/apps/gromacs/2022.4
    Working dir:  /gpfs/gsfs10/users/apptest4/ADH/adh_cubic
    Command line:
      gmx grompp -f pme_verlet.mdp -c conf.gro -p topol.top
    ...
    
    
    Sample MPI batch script for Gromacs 2022.4
    #!/bin/bash
    
    module load gromacs/2022.4
    
    gmx_mpi mdrun -ntomp 1 -s topol.tpr
    
    
    For jobs running on CPU, it is recommended to use homogenous nodes with --constraint=x2695. GPU is also available for version 2022.4, you can submit these jobs using the following:
    sbatch --ntasks=# --ntasks-per-core=1 --nodes=1 run.2022.4  #ntasks < 16
    
    sbatch -p multinode --ntasks=# --ntasks-per-core=1 --nodes=1 run.2022.4  #ntasks > 16
    
    sbatch -p gpu --gres=gpu:a100:1 --ntasks=1 --ntasks-per-core=1 run.2022.4  # for 1 a100 GPU
    
    
    where:
    --partition=multinode Submit to the multinode partition where all nodes are Infiniband-connected
    --constraint=x2695 All nodes should be x2695's.
    --ntasks # the number of MPI processes you wish to run.
    --ntasks-per-core=1 ensures that Gromacs will only run 1 MPI process per physical core (i.e will not use both hyperthreaded CPUs). This is recommended for parallel jobs.
    -ntomp 1 uses only one OMP thread per MPI thread. This means that Gromacs will run using only MPI, which provides the best performance.
    --time=168:00:00 max walltime=168 hrs (1 week). See the section on chaining jobs below.
    --exclusive Allocate the nodes exclusively to this job (recommended for parallel jobs)

    Documentation

    Gromacs documentation for 2022.4

  • Gromacs version 2024 has three available verisons; two are version 2024.0- one compiled with Intel 2022 and one compiled with gcc

    Gromacs version 2024.1 is also available with the new Intel 2024 compiler

  • Module Name: gromacs/2024-gcc11.3 (gcc)
  • Module Name: gromacs/2024-intel2022 (Intel)
  • Module Name: gromacs/2024.1-intel2024 (Intel, default)

    Getting Started
    [user@biowulf]$ module load gromacs
    [+] Loading CUDA Toolkit  12.1.0  ...
    [+] Loading Intel 2024.0.1.46  Compilers ...
    [+] Loading openmpi/4.1.6/intel-2024.0.1.46  ...
    [+] Loading Gromacs 2024.1-intel2024  ...
    [+] Built with OpenMPI 4.1.6, gcc 11.3.0, Intel 2024.0.1.46
    [user@biowulf]$ mkdir /data/$USER/gromacs && cd /data/$USER/gromacs
    [user@biowulf]$ tar xvzf /usr/local/apps/gromacs/tests/ADH_bench_systems.tar.gz
    [user@biowulf]$ cd ADH/adh_cubic
    
    #create MD input file using gromacs preprocessor
    [user@biowulf adh_cubic]$ gmx grompp -f pme_verlet.mdp -c conf.gro -p topol.top 
                    :-) GROMACS - gmx grompp, 2024.1 (-:
    
    Executable:   /usr/local/apps/gromacs/2024.1-intel2024.0/bin/gmx
    Data prefix:  /usr/local/apps/gromacs/2024.1-intel2024.0
    Working dir:  /vf/users/ashdownht/gromacs_all/g-copy/adh_cubic
    Command line:
      gmx grompp -f pme_verlet.mdp -c conf.gro -p topol.top
    ...
    
    
    Sample MPI batch script for Gromacs 2024.1
    #!/bin/bash
    
    module load gromacs/2024.1-intel2024
    
    gmx_mpi mdrun -ntomp 1 -s topol.tpr
    
    
    For jobs running on CPU, it is recommended to use homogenous nodes with --constraint=x2695. GPU is also available for version 2023.2, you can submit these jobs using the following:
    sbatch --ntasks=# --ntasks-per-core=1 --nodes=1 run.2024.1  #ntasks < 16
    
    sbatch -p multinode --ntasks=# --ntasks-per-core=1 --nodes=1 run.2024.1  #ntasks > 16
    
    sbatch -p gpu --gres=gpu:a100:1 --ntasks=1 --ntasks-per-core=1 run.2024.1  # for 1 a100 GPU
    
    
    where:
    --partition=multinode Submit to the multinode partition where all nodes are Infiniband-connected
    --constraint=x2695 All nodes should be x2695's.
    --ntasks # the number of MPI processes you wish to run.
    --ntasks-per-core=1 ensures that Gromacs will only run 1 MPI process per physical core (i.e will not use both hyperthreaded CPUs). This is recommended for parallel jobs.
    -ntomp 1 uses only one OMP thread per MPI thread. This means that Gromacs will run using only MPI, which provides the best performance.
    --time=168:00:00 max walltime=168 hrs (1 week). See the section on chaining jobs below.
    --exclusive Allocate the nodes exclusively to this job (recommended for parallel jobs)

    Documentation

    Gromacs documentation for 2023.2

  • Replica Exchange

    Sample batch script for Gromacs 4.6.5 (thanks to Mingzhen Zhang)

    #!/bin/bash
    
    module load gromacs/2018
    
    cd $SLURM_SUBMIT_DIR
    
    mpirun -np $SLURM_NTASKS `which mdrun_mpi` -ntomp 1 -s cmd_.tpr -maxh 0.50 -resethway -noconfout -cpi state.cpt -noappend -multi 48 -replex 1000
    

    Submit with:

    sbatch --partition multinode --constraint=x2695 --job-name=MyJob --ntasks=64 --ntasks-per-core=1 --exclusive   myjobscript
    

    On GPUs

    Use 'freen' to see GPUs available.

    biowulf% freen
    
    .......Per-Node Resources......
    Partition   FreeNds  FreeCPUs   FreeGPUs Cores CPUs GPUs Mem  Disk Features
    -------------------------------------------------------------------------------------------------------
    ...
    multinode   320/587  20168/32872         28    56        247g 400g cpu56,core28,g256,ssd400,x2695,ibfdr
    multinode   15 /620  2254 /34720         28    56        247g 800g cpu56,core28,g256,ssd800,x2680,ibfdr
    gpu (v100x)  3 /52   2262 /3744 43 /208  36    72   4    373g 1600g cpu72,core36,g384,ssd1600,x6140,ibhdr,gpuv100x
    gpu (a100)   0 /33   890  /2112  0 /132  32    64   4    247g 3200g cpu64,core32,g256,ssd3200,e7543p,ibhdr200,gpua100
    gpu (p100)   3 /46   776  /2576 42 /184  28    56   4    121g 650g cpu56,core28,g128,ssd650,x2680,ibfdr,gpup100
    gpu (k80)    4 /19   224  /1064 16 /76   28    56   4    247g 400g cpu56,core28,g256,ssd400,x2695,ibfdr,gpuk80
    gpu (k80)   21 /65   2070 /3640 138/260  28    56   4    247g 800g cpu56,core28,g256,ssd800,x2680,ibfdr,gpuk80
    gpu (v100)   0 /7    298  /392   5 /28   28    56   4    121g 800g cpu56,core28,g128,ssd800,x2680,ibfdr,gpuv100
    
    ...
    
    Chaining jobs

    The max walltime on the multinode partition is 10 days. (type 'batchlim' to see the CPU and walltime limits on all partitions). Thus, jobs should be designed to run for a week or so, save a checkpoint file, and submit a new job starting from that checkpoint.

    A reasonable strategy would be to set up a job to run for a week or less by setting the number of steps appropriately, and then, at the end of the job, have it resubmit itself to continue the simulation. Below is a sample batch script:

    #!/bin/bash
    # this script is called Run.ib
    
    module load gromacs/2018.3
    
    cd /path/to/my/dir
    
    mpirun -np $SLURM_NTASKS `which mdrun_mpi` -ntomp 1 -s ion_channel.tpr -maxh 0.50 -resethway -noconfout -nsteps 1000
    
    # use tpbconv to create a new topol.tpr file with an increased number of steps
    tpbconv -s topol.tpr -extend 500 -o topol2.tpr
    
    #move the newly created topol.tpr into place
    mv topol.tpr topol.tpr.prev; mv topol2.tpr topol.tpr
    
    #resubmit this script
    sbatch --partition multinode --constraint=x2680 --job-name=gmx  --ntasks=# --ntasks-per-core=1  --time=168:00:00 --exclusive  Run.ib
    
    More information at Extending Simulations on the Gromacs site.

    If a Gromacs job is terminated unexpectedly (for example, the walltime limit was hit before the mdrun completed), it is simple to restart. The state.cpt file contains all the information necessary to continue the simulation. Use the '-cpi' and '-append' options to mdrun, which will append to existing energy, trajectory and log files. For example:

    mpirun -n $np `which mdrun_mpi` -s topol.tpr -cpi state.cpt -append
    

    More information at Doing Restarts on the Gromacs website.

    Tips for Best Performance

    Make sure you request homogenous resources. The 'freen' command will show several kinds of nodes in the multinode partition. Pick one (depending on CPU speed or availability), and submit to only that type of node by using the 'constraint' flag. e.g.

    #SBATCH --constraint=x2695
    
    Running on a mix of node types will effectively mean running on the slowest type of node.

  • The bottleneck for MD MPI-based programs is the inter-node communication, so you should submit to as few nodes as possible, utilizing all the cores on the node. For example, suppose you want to run 100 ntasks. 'freen' shows nodes with 16 or 28 cores. If you run on 28-core nodes, you can utilize 4 full nodes with:
    #SBATCH --ntasks=112
    #SBATCH --ntasks-per-core=1
    #SBATCH --nodes=4
    #SBATCH --exclusive
    #SBATCH --constraint=nodetype
    

  • Check the job utilization with 'jobload' -- ideally you should see all the allocated cores fully utilized. Since only one process is running per core, you should see 50% utilization on each of the allocated nodes.
    Example of a well set up job
    
    biowulf% jobload -j 11111111
               JOBID            TIME            NODES  CPUS  THREADS   LOAD       MEMORY
                         Elapsed / Wall               Alloc   Active           Used /     Alloc
          11111111    0-04:05:52 /  2-02:00:00 cn1135    56       28   50%     0.8 /    4.0 GB
                      0-04:05:52 /  2-02:00:00 cn2170    56       28   50%     0.8 /    4.0 GB
                      0-04:05:52 /  2-02:00:00 cn2171    56       28   50%     0.8 /    4.0 GB
                      0-04:05:52 /  2-02:00:00 cn2172    56       28   50%     0.8 /    4.0 GB
      
    Example of a poorly set up job
    
    biowulf% jobload -j 11111111
               JOBID            TIME            NODES  CPUS  THREADS   LOAD       MEMORY
                         Elapsed / Wall               Alloc   Active           Used /     Alloc
          11111111    0-04:05:52 /  2-02:00:00 cn1135     4        4   100%     0.3 /    4.0 GB
                      0-04:05:52 /  2-02:00:00 cn2170    16       13    81%     0.8 /    4.0 GB
                      0-04:05:52 /  2-02:00:00 cn2171    16       13    81%     0.8 /    4.0 GB
                      0-04:05:52 /  2-02:00:00 cn2172    48       13    27%     0.8 /    4.0 GB
                      0-04:05:52 /  2-02:00:00 cn3024    44       13    30%     0.8 /    4.0 GB
                      0-04:05:52 /  2-02:00:00 cn3025    44       13    30%     0.8 /    4.0 GB
     
    Benchmarks

    see the benchmarks page

    Documentation

    Gromacs website
    mdrun documentation for v5.0.4