High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Amber on Biowulf

AMBER (Assisted Model Building with Energy Refinement) is a package of molecular simulation programs. AMBER contains a large number of of modules; note that only the sander modules and pmemd are parallelized.

On Helix

The sander and pmemd modules are compute-intensive and should not be run on Helix. The other Amber executables, e.g. autopdb, etc. can be used on Helix to generate input for Biowulf runs. You can also run XLeap on Helix.

LEaP is a graphical builder of input files for AMBER modules. LEaP can be used via the Xwindows graphical interface xleap, or the terminal version tleap. To run xleap,

  1. Open an Xwindows session to Biowulf or Helix. (More information about Xwindows on Macs, Windows, and Unix desktop machines.)
  2. Load the module for the version you want, and then type 'xleap'.
    biowulf% module load amber/16
    biowulf% xleap
    
    You should see the xleap window appear, in which you can type any LEaP commands.
See the AMBER tutorials for more information.
Batch job on Biowulf

For basic information about setting up an Amber job, see the Amber manual and the Amber tutorials .

The Amber executables can run in parallel on all Biowulf computational nodes. However, benchmark runs indicate that Amber jobs scale best to the CPUs on a single node. Therefore we recommend that users run Amber jobs on the regular norm partition nodes or on the GPU nodes. To determine the most appropriate number of CPUs to allocate, you should run your own benchmarks.

Sample script

#!/bin/bash
# This file is amber.run
#

module load amber/16

cd /data/$USER/amber/myproject
mpirun  $AMBERHOME/bin/pmemd.MPI -O -i mdin -o mdout -inf mdinfo -x mdcrd -r restrt

Submit with, for example:
sbatch --ntasks=8 --ntasks-per-core=1 --nodes=1 --time=168:00:00 --exclusive amber.run
This job would be run on 8 cores of a single node, and will not utilize hyperthreaded cores. The max walltime is set to 168 hrs, which is a week. See the section on walltime limits below.
On a single GPU

Amber runs extremely fast on a single GPU. Since the GPU performance is significantly better than the CPU performance, it is worth running most Amber jobs on a single GPU. (see benchmarks). Larger molecular systems may benefit from running on more than 1 GPU, but please run your own benchmarks to make sure (and send them to us!)

Set up your Amber batch script along the following lines:

#!/bin/bash

cd /data/$USER/mydir

module load amber/16-gpu

$AMBERHOME/bin/pmemd.cuda -O -i mdin -o mdout -inf mdinfo -x mdcrd -r restrt

Submit this job with:

sbatch --partition=gpu --gres=gpu:k20x:1  jobscript
or
sbatch --partition=ccrgpu --gres=gpu:k80 jobscript        (NCI CCR users)
where
--partition=gpu submit to the GPU partition
--gres=gpu:k20x:1 allocate a single k20x GPU for this job

The jobload command will show 1 CPU being used. The output from Amber will indicate the GPU usage. The 'nvidia-smi' command can also be used to check whether Amber executables are using the GPU (as described in the section below)

On 2 GPUs

It is not possible to run a single Amber job on both the K20s on a node, since those 2 GPUs do not have peer-to-peer communication. (see the Amber GPU page for an explanation of peer-to-peer communication).

However, on the K80 nodes, the GPUs do have peer-to-peer communication. It is therefore possible to run on 2 GPUs on a K80 node. It is important to specify the GPUs used: as per our benchmarks, the 2 GPUs on a card perform much worse than 2 GPUs which are on separate cards. It is not possible to tell the batch system to allocate specific GPUs, so to run on 2 GPUs, you have to allocate the whole node and specify to Amber which 2 GPUs to use. It is most efficient to run 2 simultaneous Amber jobs on a K80 node, using different pairs of 2 GPUs as in the example below.

Sample batch script:

#!/bin/bash

module load amber/16-gpu
cd /path/to/your/dir
CUDA_VISIBLE_DEVICES=0,2; mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i in1 -o out1 -inf info1 -x crd1 -r r1 &
CUDA_VISIBLE_DEVICES=1,3; mpirun -np 2 $AMBERHOME/bin/pmemd.cuda.MPI -O -i in2 -o out2 -inf info2 -x crd2 -r r2 &
wait
Submit with:
sbatch --partition=ccrgpu --gres=gpu:k80:4 --exclusive --time=12:00:00  jobscript

As of Jan 2017, the K80s on the system are funded by NCI_CCR and therefore restricted to CCR users. Non-CCR users can access these nodes via the 'quick' queue, as long as their jobs are shorter than 4 hrs. In March 2017, 72 additional K80s will be added to the Biowulf cluster and available to all users.

Sample submission command for non-CCR users submitting to the K80s via the quick queue:

sbatch --partition=quick --constraint=gpuk80--gres=gpu:k80:4 --exclusive --time=3:00:00  jobscript

You can check the behaviour of your job with the 'nvidia-smi' utility. Determine the GPU node on which your job is running via jobload or sjobs. Suppose your job is on node cn0626, and is using 2 GPUs:

biowulf% rsh cn0626 nvidia-smi
Mon Jan 16 17:52:13 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:83:00.0     Off |                  Off |
| N/A   41C    P0   115W / 149W |    145MiB / 12205MiB |     75%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:84:00.0     Off |                  Off |
| N/A   27C    P8    33W / 149W |      0MiB / 12205MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 0000:8A:00.0     Off |                  Off |
| N/A   70C    P0   128W / 149W |    145MiB / 12205MiB |     80%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 0000:8B:00.0     Off |                  Off |
| N/A   45C    P8    33W / 149W |      0MiB / 12205MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     51187    C   ...cal/apps/amber/amber16/bin/pmemd.cuda.MPI    61MiB |
|    0     51188    C   ...cal/apps/amber/amber16/bin/pmemd.cuda.MPI    79MiB |
|    2     51187    C   ...cal/apps/amber/amber16/bin/pmemd.cuda.MPI    79MiB |
|    2     51188    C   ...cal/apps/amber/amber16/bin/pmemd.cuda.MPI    61MiB |
+-----------------------------------------------------------------------------+
The GPU numbers reported by 'nvidia-smi' may not match the GPUs you specified with the 'CUDA_VISIBLE_DEVICES' variable.

Walltime limits and chaining jobs

Walltime limits are set on most Biowulf partitions. Type 'batchlim' to see the current walltime limits, or see the systems status page. Note that the default walltime on the norm queue is 4 hrs, but you can extend this to 10 days. Amber obs should be designed to run for a week or so, save a checkpoint file, and submit a new job starting from that checkpoint.

An example batch script is below. This script runs a single simulation, saves a copy of the output files, and then resubmits a new job starting from Amber's 'restart' file.

#!/bin/bash
# this file is called amber.run

module load  amber/16
module list

echo "Running on $SLURM_NTASKS corse"



# rename the restart file to the coordinate filename
mv restrt inpcrd

#run sander
mpirun -np $SLURM_NTASKS `which sander.MPI` -O -i mdin -c inpcrd -p prmtop -r restrt -x traj -e ene -o mdout

#keep a copy of the output from this run
 mv mdout  md_$run.out
 mv traj  md_$run.trj
 mv ene  md_$run.ene
 cp restrt  md_$run.rst

# if less than 10 runs have been performed, increase the run number and submit the next job
if (( "$run" < "10" ))
   then
     run=`expr $run + 1`
     sbatch --ntasks=8 --ntasks-per-core=1 --time=168:00:00 --exclusive amber.run
fi

To submit this job, copy the original input coordinate file to 'restrt' for the first run, and then submit.

cp inpcrd restrt               
sbatch --ntasks=8 --ntasks--per-core=1 --time=168:00:00 --exclusive amber.run

Benchmarks

Based on the benchmarks, it is highly recomended that you run Amber on a GPU node. The performance on a single K20x and a single K80 are similar for most of the benchmarks run. It is not possible to run a single Amber job on both the K20s on a node, since those 2 GPUs do not have peer-to-peer communication. (see the Amber on GPUs page for an explanation of peer-to-peer communication).

If you run on the Biowulf K80 GPUs, it is possible to utilize 2 GPUs on a node, providing a performance advantage of about 50%. As of Jan 2017, the K80s on the system are funded by NCI_CCR and therefore restricted to CCR users. Non-CCR users can access these nodes via the 'quick' queue, as long as their jobs are shorter than 4 hrs. In March 2017, 72 additional K80s will be added to the Biowulf cluster and available to all users.

Full benchmark details

Documentation

Amber 16 reference manual
Combined Amber 14 and Ambertools 15 reference manual
Amber website