GROMACS (www.gromacs.org) is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
Gromacs can multi-thread as well as use MPI. For small jobs, e.g. 8 cpus on a single node, multi-threading works almost as well as MPI. For larger jobs, it is best to use MPI. (see the Benchmarks page) for details.
Specifying a homogenous set of nodes
The 'multinode' partition, to which all jobs that require more than a single node must be submitted, is heterogenous. For efficient parallel jobs, you need to ensure that you request nodes of a single CPU type. For example, at the time of writing this webpage, the 'freen' command displays:
biowulf% freen ... multinode 65/466 3640/26096 28 56 248g 400g cpu56,core28,g256,ssd400,x2695,ibfdr multinode 4/190 128/6080 16 32 60g 800g cpu32,core16,g64,ssd800,x2650,ibfdr multinode 312/539 17646/30184 28 56 250g 800g cpu56,core28,g256,ssd800,x2680,ibfdr ...These lines indicate that there are 3 kinds of nodes in the multinode partition. You should submit your job exclusively to one kind of node by specifying --constraint=x2695 or --constraint=x2680, as in the examples below. (Note that Gromacs 2018.3 and 2020.2 will not currently run on the x2650 nodes)
Sample MPI batch script for Gromacs 2022.4
#!/bin/bash module load gromacs/2022.4 mpirun gmx_mpi mdrun -ntomp 1 -s topol.tprGromacs will use GPUs if available, but will run on CPUs if not. Therefore, the same batch script will work for both GPUs and CPUs. Sample submission commands:
sbatch --ntasks=# --ntasks-per-core=1 --nodes=1 run.2022.4 # CPUs sbatch -p gpu --gres=gpu:p100:1 --ntasks=1 --ntasks-per-core=1 run.2022.4 # for 1 p100 GPUwhere 'p100' can be replaced by whichever GPU type is desired. Use 'freen' to see GPUs available.
Sample MPI batch script for Gromacs 2021.3 and before:
#!/bin/bash module load gromacs/2018 mpirun -np $SLURM_NTASKS `which mdrun_mpi` -ntomp 1 -s ion_channel.tpr -maxh 0.50 \ -resethway -noconfout -nsteps 1000Submit this job with:
sbatch --partition multinode --constraint=x2695 --job-name=gmx --ntasks=# --ntasks-per-core=1 --time=168:00:00 --exclusive jobscriptwhere
--partition multinode | Submit to the multinode partition where all nodes are Infiniband-connected |
--constraint=x2695 | All nodes should be x2695's. |
--ntasks # | the number of MPI processes you wish to run. |
--ntasks-per-core=1 | ensures that Gromacs will only run 1 MPI process per physical core (i.e will not use both hyperthreaded CPUs). This is recommended for parallel jobs. |
-ntomp1 | uses only one OMP thread per MPI thread. This means that Gromacs will run using only MPI, which provides the best performance. |
--time=168:00:00 | max walltime=168 hrs (1 week). See the section on chaining jobs below. |
--exclusive | Allocate the nodes exclusively to this job (recommended for parallel jobs) |
Sample batch script for Gromacs 4.6.5 (thanks to Mingzhen Zhang)
#!/bin/bash module load gromacs/2018 cd $SLURM_SUBMIT_DIR mpirun -np $SLURM_NTASKS `which mdrun_mpi` -ntomp 1 -s cmd_.tpr -maxh 0.50 -resethway -noconfout -cpi state.cpt -noappend -multi 48 -replex 1000
Submit with:
sbatch --partition multinode --constraint=x2695 --job-name=MyJob --ntasks=64 --ntasks-per-core=1 --exclusive myjobscript
GPU support is built in to Gromacs 5.*. Sample batch script with the adh_cubic job from the Gromacs GPU documentation. The files for this job are available in /usr/local/apps/gromacs/ADH_bench_systems.tar.gz (untar the file and look in the adh_cubic directory).
Note that the only following versions on Biowulf are built with CUDA/GPU support:
Sample batch script:
#!/bin/bash module load gromacs/2018.3 mkdir /data/$USER/gromacs/ cd /data/$USER/gromacs tar xvzf /usr/local/apps/gromacs/adh_cubic.tar.gz cd adh_cubic mpirun -np $SLURM_NTASKS --mca btl_openib_if_exclude "mlx4_0:1" `which mdrun_mpi` \ -ntomp $SLURM_CPUS_PER_TASK -s topol.tpr
Submit this job with, for example:
sbatch --partition=gpu --gres=gpu:k80:2 --ntasks=2 --ntasks-per-core=1 --cpus-per-task=1 --time=HH:MM:SS jobscript
The above command will submit the job to 2 GPUs.
-ntomp 1 | sets 1 CPU thread per MPI (GPU) process. In our tests a 1:1 CPU:GPU ratio gave the best performance (see benchmarks). |
--mca btl_openib_if_exclude "mlx4_0:1" | prevents a warning about OpenFabrics from appearing in your output. You can also leave it out and live with the warning :-). |
--partition=gpu | submit to the GPU partition |
--gres=gpu:k20x:2 | Resource=gpu, Resource type=k20x, Count=2 (2 GPUs). Note that 'count' is required even if you are submitting to a single GPU. |
--ntasks=2 | Number of MPI tasks to spawn. |
--ntasks-per-core=1 | Run only 1 MPI task per physical core. This is best for most MD jobs |
--time=HH:MM:SS | walltime to be allocated for the job -- HH hours, MM minutes, SS seconds |
--cpus-per-task=1 | >Number of threads per MPI task. You should run your own benchmarks to determine the best values for ntasks and cpus-per-task. |
The max walltime on the multinode partition is 10 days. (type 'batchlim' to see the CPU and walltime limits on all partitions). Thus, jobs should be designed to run for a week or so, save a checkpoint file, and submit a new job starting from that checkpoint.
A reasonable strategy would be to set up a job to run for a week or less by setting the number of steps appropriately, and then, at the end of the job, have it resubmit itself to continue the simulation. Below is a sample batch script:
#!/bin/bash # this script is called Run.ib module load gromacs/2018.3 cd /path/to/my/dir mpirun -np $SLURM_NTASKS `which mdrun_mpi` -ntomp 1 -s ion_channel.tpr -maxh 0.50 -resethway -noconfout -nsteps 1000 # use tpbconv to create a new topol.tpr file with an increased number of steps tpbconv -s topol.tpr -extend 500 -o topol2.tpr #move the newly created topol.tpr into place mv topol.tpr topol.tpr.prev; mv topol2.tpr topol.tpr #resubmit this script sbatch --partition multinode --constraint=x2680 --job-name=gmx --ntasks=# --ntasks-per-core=1 --time=168:00:00 --exclusive Run.ibMore information at Extending Simulations on the Gromacs site.
If a Gromacs job is terminated unexpectedly (for example, the walltime limit was hit before the mdrun completed), it is simple to restart. The state.cpt file contains all the information necessary to continue the simulation. Use the '-cpi' and '-append' options to mdrun, which will append to existing energy, trajectory and log files. For example:
mpirun -n $np `which mdrun_mpi` -s topol.tpr -cpi state.cpt -append
More information at Doing Restarts on the Gromacs website.
#SBATCH --constraint=x2695Running on a mix of node types will effectively mean running on the slowest type of node.
#SBATCH --ntasks=112 #SBATCH --ntasks-per-core=1 #SBATCH --nodes=4 #SBATCH --exclusive #SBATCH --constraint=nodetype
Example of a well set up job biowulf% jobload -j 11111111 JOBID TIME NODES CPUS THREADS LOAD MEMORY Elapsed / Wall Alloc Active Used / Alloc 11111111 0-04:05:52 / 2-02:00:00 cn1135 56 28 50% 0.8 / 4.0 GB 0-04:05:52 / 2-02:00:00 cn2170 56 28 50% 0.8 / 4.0 GB 0-04:05:52 / 2-02:00:00 cn2171 56 28 50% 0.8 / 4.0 GB 0-04:05:52 / 2-02:00:00 cn2172 56 28 50% 0.8 / 4.0 GB
Example of a poorly set up job biowulf% jobload -j 11111111 JOBID TIME NODES CPUS THREADS LOAD MEMORY Elapsed / Wall Alloc Active Used / Alloc 11111111 0-04:05:52 / 2-02:00:00 cn1135 4 4 100% 0.3 / 4.0 GB 0-04:05:52 / 2-02:00:00 cn2170 16 13 81% 0.8 / 4.0 GB 0-04:05:52 / 2-02:00:00 cn2171 16 13 81% 0.8 / 4.0 GB 0-04:05:52 / 2-02:00:00 cn2172 48 13 27% 0.8 / 4.0 GB 0-04:05:52 / 2-02:00:00 cn3024 44 13 30% 0.8 / 4.0 GB 0-04:05:52 / 2-02:00:00 cn3025 44 13 30% 0.8 / 4.0 GB
see the benchmarks page