Biowulf High Performance Computing at the NIH
Amber on Biowulf

AMBER (Assisted Model Building with Energy Refinement) is a package of molecular simulation programs. AMBER contains a large number of of modules; note that only the sander modules and pmemd are parallelized.

There are several versions of Amber 18 and Amber 16 available. The module names are as follows:

amber/18 built with the AVX2 instruction set, Intel 2017.4.196 compilers, OpenMPI 2.1.2. This version has better performance, and runs on all of the nodes in norm and multinode, but will not run on the oldest nodes (x2670 and x5660) in the quick queue.
amber/18-gpu module sets up CUDA library paths as well. Intended for GPU runs.
amber/18.generic built with Intel 2017.4.196 compilers, OpenMPI 2.1.2. This version is slower because it does not use AVX2, but will run on any node in the cluster including the oldest quick queue nodes.
amber/16 built with the AVX2 instruction set, Intel 2017.4.196 compilers, OpenMPI 2.1.2. This version has better performance, and runs on all of the nodes in norm and multinode, but will not run on the oldest nodes (x2670 and x5660) in the quick queue.
amber/16-gpu module sets up CUDA library paths as well. Intended for GPU runs.
amber/16.generic built with Intel 2017.4.196 compilers, OpenMPI 2.1.2. This version is slower because it does not use AVX2, but will run on any node in the cluster including the oldest quick queue nodes.
Thus, if you wish to run on any node in the quick queue, use the module amber/*.generic

LEaP is a graphical builder of input files for AMBER modules. LEaP can be used via the Xwindows graphical interface xleap, or the terminal version tleap. To run xleap,

  1. Open an Xwindows session to Biowulf or Helix. (More information about Xwindows on Macs, Windows, and Unix desktop machines.)
  2. Load the module for the version you want, and then type 'xleap'.
    biowulf% module load amber/16
    biowulf% xleap
    
    You should see the xleap window appear, in which you can type any LEaP commands.
See the AMBER tutorials for more information.
Batch job on Biowulf

For basic information about setting up an Amber job, see the Amber manual and the Amber tutorials .

The Amber executables can run in parallel on all Biowulf computational nodes. However, benchmark runs indicate that Amber jobs scale best to the CPUs on a single node. Therefore we recommend that users run Amber jobs on the regular norm partition nodes or on the GPU nodes. To determine the most appropriate number of CPUs to allocate, you should run your own benchmarks.

Sample script

#!/bin/bash
# This file is amber.run
#

module load amber/16

cd /data/$USER/amber/myproject
mpirun  $AMBERHOME/bin/pmemd.MPI -O -i mdin -o mdout -inf mdinfo -x mdcrd -r restrt

Submit with, for example:
sbatch --ntasks=8 --ntasks-per-core=1 --nodes=1 --time=168:00:00 --exclusive amber.run
This job would be run on 8 cores of a single node, and will not utilize hyperthreaded cores. The max walltime is set to 168 hrs, which is a week. See the section on walltime limits below.
On a single GPU

Amber runs extremely fast on a single GPU. Since the GPU performance is significantly better than the CPU performance, it is worth running most Amber jobs on a single GPU. (see benchmarks). Larger molecular systems may benefit from running on more than 1 GPU, but please run your own benchmarks to make sure (and send them to us!)

Set up your Amber batch script along the following lines:

#!/bin/bash

cd /data/$USER/mydir

module load amber/16-gpu

$AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout -inf mdinfo -x mdcrd -r restrt

Submit this job with:

sbatch --partition=gpu --gres=gpu:k80:1 jobscript       (1 K80 GPU)
or
sbatch --partition=gpu --gres=gpu:p100:1 jobscript      (1 P100 GPU)
or
sbatch --partition=gpu --gres=gpu:v100:1 jobscript      (1 V100 GPU)
where
--partition=gpu submit to the GPU partition
--gres=gpu:k20x:1 allocate a single k20x GPU for this job

The jobload command will show 1 CPU being used. The output from Amber will indicate the GPU usage. The 'nvidia-smi' command can also be used to check whether Amber executables are using the GPU (as described in the section below)

On 2 GPUs

It is not possible to run a single Amber job on both the K20s on a node, since those 2 GPUs do not have peer-to-peer communication. (see the Amber GPU page for an explanation of peer-to-peer communication).

However, on the K80 nodes, the GPUs do have peer-to-peer communication. It is therefore possible to run on 2 GPUs on a K80 node. However, the performance in most cases is worse on 2 GPUs than on a single GPU. If you plan to run a job on 2 GPUs, please run benchmarks first and verify that the performance is better on 2 GPUs than on 1. (Benchmarks). Note that the batch system will set the variable $CUDA_VISIBLE_DEVICES to the allocated GPUs.

Sample batch script:

#!/bin/bash

module load amber/16-gpu
cd /path/to/your/dir
mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i in1 -o out1 -inf info1 -x crd1 -r r1 
Submit with:
sbatch --partition=gpu --gres=gpu:k80:2  --time=12:00:00  jobscript

You can check the behaviour of your job with the 'nvidia-smi' utility. Determine the GPU node on which your job is running via jobload or sjobs. Suppose your job is on node cn0626, and is using 2 GPUs:

biowulf% rsh cn0626 nvidia-smi
Mon Jan 16 17:52:13 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:83:00.0     Off |                  Off |
| N/A   41C    P0   115W / 149W |    145MiB / 12205MiB |     75%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:84:00.0     Off |                  Off |
| N/A   27C    P8    33W / 149W |      0MiB / 12205MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 0000:8A:00.0     Off |                  Off |
| N/A   70C    P0   128W / 149W |    145MiB / 12205MiB |     80%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 0000:8B:00.0     Off |                  Off |
| N/A   45C    P8    33W / 149W |      0MiB / 12205MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     51187    C   ...cal/apps/amber/amber16/bin/pmemd.cuda.MPI    61MiB |
|    0     51188    C   ...cal/apps/amber/amber16/bin/pmemd.cuda.MPI    79MiB |
|    2     51187    C   ...cal/apps/amber/amber16/bin/pmemd.cuda.MPI    79MiB |
|    2     51188    C   ...cal/apps/amber/amber16/bin/pmemd.cuda.MPI    61MiB |
+-----------------------------------------------------------------------------+
The GPU numbers reported by 'nvidia-smi' may not match the GPUs you specified with the 'CUDA_VISIBLE_DEVICES' variable.

Walltime limits and chaining jobs

Walltime limits are set on most Biowulf partitions. Type 'batchlim' to see the current walltime limits, or see the systems status page. Note that the default walltime on the norm queue is 4 hrs, but you can extend this to 10 days. Amber obs should be designed to run for a week or so, save a checkpoint file, and submit a new job starting from that checkpoint.

An example batch script is below. This script runs a single simulation, saves a copy of the output files, and then resubmits a new job starting from Amber's 'restart' file.

#!/bin/bash
# this file is called amber.run

module load  amber/16
module list

echo "Running on $SLURM_NTASKS corse"



# rename the restart file to the coordinate filename
mv restrt inpcrd

#run sander
mpirun -np $SLURM_NTASKS `which sander.MPI` -O -i mdin -c inpcrd -p prmtop -r restrt -x traj -e ene -o mdout

#keep a copy of the output from this run
 mv mdout  md_$run.out
 mv traj  md_$run.trj
 mv ene  md_$run.ene
 cp restrt  md_$run.rst

# if less than 10 runs have been performed, increase the run number and submit the next job
if (( "$run" < "10" ))
   then
     run=`expr $run + 1`
     sbatch --ntasks=8 --ntasks-per-core=1 --time=168:00:00 --exclusive amber.run
fi

To submit this job, copy the original input coordinate file to 'restrt' for the first run, and then submit.

cp inpcrd restrt               
sbatch --ntasks=8 --ntasks--per-core=1 --time=168:00:00 --exclusive amber.run

Benchmarks

Based on the benchmarks, it is highly recomended that you run Amber on a GPU node.

Full benchmark details

Documentation

Amber 16 reference manual
Combined Amber 14 and Ambertools 15 reference manual
Amber website