CHARMM (Chemistry at HARvard Macromolecular Mechanics) [1]:
In order provide support for multiple executable types using a range of parallel communication methods and hardware via a common interface, cover scripts have been made available in /usr/local/apps/charmm/bin on Biowulf, each named for the CHARMM release version they support. The recommended setup is to use
module load charmm
to modify the command search path for your shell (bash or csh); the cover script will load any additional modules required. Besides the cover scripts, a tool for extracting data from CHARMM output log files, getprop.csh, is also available. Descriptions of the cover script commands, syntax listings, and usage examples are given in this section. General usage notes for running CHARMM via the SLURM queue and using the X11 graphics are given below, after the cover script descriptions.
biowulf /<6>charmm [5] module load charmm biowulf /<6>charmm [6] c42b2 -help Syntax; square brackets indicate [ optional args ] SINGLE PROC c42b2 [options] [ charmm-args ] < file.inp >& file.out Omit "< file.inp >& file.out" for interactive use (e.g. graphics) PARALLEL c42b2 [options] ompi Nproc file.inp [ charmm-args ] >& file.out # OpenMPI c42b2 -h | -help # this listing Notes: [options] ; must precede parallel keywords, order dependent verbose :: prints additional environment info; must be *first* ddg :: domdec_gpu; requires node with GPU (-p gpu) sse :: override AVX architecture detection PM Ewald type override option; default includes COLFFT and DOMDEC: async :: alt. (slower) PME method, incl. REPDSTR and MSCALE Parallel args; input filename required as 3rd arg ompi :: use the OpenMPI parallel library Nproc :: number of MPI processes; one per core, or one per GPU (ddg) charmm-args :: optional script @ vars, in the form N:27 or RUN=15 etc. (N.B. must follow any options and parallel args) Examples: c42b2 MDL:2 < minmodel.inp >& minmodel.out # single proc min c42b2 ompi 16 minmodel.inp MDL:2 >& minmodel.out # minimization c42b2 ompi 64 dyn.inp >& dyn.out # COLFFT or DOMDEC c42b2 ddg ompi 8 dyn.inp >& dyn.out # DOMDEC_GPU c42b2 ompi 64 dyn.inp -chsize 450000 >& dyn.out # 450000 atom limit c42b2 sse ompi 64 dyn.inp >& dyn.out # force use of SSE on AVX host c42b2 async ompi 32 dyn.inp >& dyn.out # async; P21 symmetry, REPDSTR
The above usage examples illustrate the positional keywords; parallel usage requires the ompi keyword, followed by two more arguments for the number of cores (not SLURM cpus!) and finally the input file name. The optional mutually exclusive arguments ddg and async invoke different executables, compiled with different feature sets and with different run-time library requirements; the c42b2 cover script loads the modules needed, e.g. CUDA for ddg, and then invokes the requested executable. The async keyword includes support for features such as replica exchange MD, non-orthogonal crystal lattices, simulation of the assymetric unit with rotations of the simulation cell, and a number of other custom features; the fast DOMDEC code is NOT supported.
The sse keyword uses an older version of the Intel chipset floating point microarchitecture instructions, and is mainly used for testing.
Which options to use depends on the details of the types of calculations or operations being done, the type of molecular system, the boundary conditions, and probably other factors.
For a non-parallel CHARMM job such as model building or ad hoc trajectory analysis, the commands and setup have few requirements. The job script (build.csh) can be simply:
#!/bin/csh cd $SLURM_SUBMIT_DIR module load charmm c42b2 < build-psf.inp >& build-psf.out
The above can be submitted to the batch queue via:
sbatch build.csh
For parallel usage, the following script (sbatch.csh) illustrates submitting a SLURM batch job which will use the 16 physical cores on each of 4 nodes (64 total cores:
#!/bin/csh # use subdir name for job id and log file names set id = $cwd:t # nodes @ n = 4 # tasks, for 16 core nodes @ nt = 16 * $n sbatch --ntasks=$nt -J $id -o $id.%j job.csh
Assuming the Infiniband (multinode) nodes will be used, the job.csh script contains:
#!/bin/csh #SBATCH --partition=multinode #SBATCH --exclusive #SBATCH --ntasks-per-core=1 cd $SLURM_SUBMIT_DIR module load charmm c42b2 ompi $SLURM_NTASKS charmmrun.inp >& charmmrun.out
The environment variable SLURM_SUBMIT_DIR points to the working directory where 'sbatch' was run, and SLURM_NTASKS contains the value given with the --ntasks= argument to sbatch. The above is suitable for most parallel CHARMM usage, other than the DOMDEC_GPU code invoked via the ddg cover script keyword (see below). A more detailed example for long simulations, also using separate scripts for job submission and job execution, is given in the Daisy Chaining section below. Of course, one could submit job.csh directly via e.g.:
sbatch --ntasks=64 -J MyJob -o MyJob.%j job.csh
where %j represents the SLURM job number.
The DOMDEC_GPU code invoked via the ddg cover script keyword uses both an MPI library (OpenMPI in this case) and OpenMP threads, and therefore requires changes to the SLURM sbatch arguments. The changes are shown in the example below (sbatchGPU.csh); the --nodes= argument must be included, and the --ntasks= value must be twice the number of nodes, as each node has two GPU devices.
#!/bin/csh # use subdir name for job id and log file names set id = $cwd:t # nodes @ n = 4 # tasks, for nodes with 2 GPUs @ nt = 2 * $n # use all eight GPUs on four nodes sbatch --ntasks=$nt --nodes=$n -J $id -o $id.%j gpujob.csh
A minimal job script follows, which illustrates additional changes to the information given to SLURM, via directives that do not depend on the number of nodes requested. Other than requesting --exclusive use of the node, all of the directives are distinct from those in the job.csh example above for general parallel usage.
#!/bin/csh #SBATCH --partition=gpu #SBATCH --exclusive #SBATCH --cpus-per-task=8 #SBATCH --gres=gpu:k20x:2 cd $SLURM_SUBMIT_DIR module load charmm c42b2 ddg ompi $SLURM_NTASKS charmmgpu.inp >& charmmgpu.out
For a variety of tasks such as model building, analysis, and graphics, foreground interactive use of CHARMM can be advantageous, esp. when developing and testing a new input script. The SLURM sinteractive command makes this fairly easy (system prompts in bold, user input in italics):
biowulf /<2>EwaldNVE [69] sinteractive salloc.exe: Granted job allocation 1693180 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn0032 are ready for job cn0032 /<2>EwaldNVE [1] module load charmm cn0032 /<2>EwaldNVE [2] c42b2 < build-psf.inp >& build-psf.out cn0032 /<2>EwaldNVE [3] exit exit salloc.exe: Relinquishing job allocation 1693180 salloc.exe: Job allocation 1693180 has been revoked.
By adding a couple SLURM options, one can also run parallel minimization jobs as well:
biowulf /<2>EwaldNVE [70] sinteractive -n 8 --ntasks-per-core=1 salloc.exe: Granted job allocation 1693189 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn0254 are ready for job cn0254 /<2>EwaldNVE [1] module load charmm cn0254 /<2>EwaldNVE [2] c42b2 < build-psf.inp >& build-psf.out cn0254 /<2>EwaldNVE [3] c42b2 ompi 8 minmodel.inp >& minmodel.out & cn0254 /<2>EwaldNVE [4] exit exit salloc.exe: Relinquishing job allocation 1693189
For troubleshooting, it may be useful to pipe the output, and both save it in file (via 'tee') and view it in the 'less' browser, e.g.:
cn0254 /<2>EwaldNVE [3] c42b2 ompi 8 minmodel.inp |& tee minmodel.out | less
Finally, CHARMM itself can be run interactively, via simply:
biowulf /<2>EwaldNVE [71] sinteractive salloc.exe: Granted job allocation 1693592 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn0103 are ready for job cn0103 /<2>EwaldNVE [1] module load charmm cn0103 /<2>EwaldNVE [2] c42b2 Linux cn0074 3.10.0-693.2.2.el7.x86_64 #1 SMP Tue Sep 12 22:26:13 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux 16:39:01 up 2 days, 17:06, 0 users, load average: 18.98, 18.69, 18.38 [+] Loading Intel 2018.1.163 Compilers ... >>>========>> FOR SYNTAX AND NOTES, TRY "c42b2 -help" -rwxrwxr-x 1 venabler 38816824 Jun 3 16:25 /usr/local/apps/charmm/c42b2/em64t/ifortavx.x11 /usr/local/apps/charmm/c42b2/em64t/ifortavx.x11 1 Chemistry at HARvard Macromolecular Mechanics (CHARMM) - Developmental Version 42b2 February 15, 2018 Copyright(c) 1984-2014 President and Fellows of Harvard College All Rights Reserved Current operating system: Linux-3.10.0-693.2.2.el7.x86_64(x86_64)@cn0074 Created on 6/7/18 at 16:39:01 by user: venabler Maximum number of ATOMS: 360720, and RESidues: 120240
At this point the program is expecting input, starting with a title; it is recommended to type bomlev -1 as the first command, as that will forgive typing errors and allow the program to continue. It also recommended to have the initial setup commands (reading RTF and PARAM files, PSF and COOR files, etc.) in a 'stream' file, so that those actions can be done vie e.g.
stream init.str
The same applies to other complex setups, such as establishing restraints, or graphics setup.
Note that the graphics uses X11, so the initial login to biowulf should use either the -X or the -Y option of the ssh command, to enable X11 tunneling for the graphics display.
Recent versions of the distributed CHARMM parameters, including the latest release, are available in /usr/local/apps/charmm as subdirectories topparYYYY where YYYY is the release year. Each release contains a number of corrections and additions from the past year, esp. for the CHARMM force fields. The toppar2017 release uses a newer format, so one should be careful about using it with older PSF files. Files distributed with models built using CHARMM-GUI now use this newer format.
In order to run a long series of CHARMM calculations, a systematic method has been developed over many years and operating systems. There are several basic concepts:
#!/bin/csh #SBATCH --partition=multinode #SBATCH --exclusive #SBATCH --ntasks-per-core=1 # ASSUMPTION (1): output files are named dyn.res dyn.trj dyn.out # ASSUMPTION (2): previous restart file read as dyn.rea # cd $SLURM_SUBMIT_DIR if ( ! -d Res ) mkdir Res if ( ! -d Out ) mkdir Out if ( ! -d Crd ) mkdir Crd if ( ! -d Trj ) mkdir Trj set chm = "/usr/local/apps/charmm/bin/c42b2 ompi $SLURM_NTASKS" set nrun = 1 set d = $cwd:t @ krun = 1 while ( $krun <= $nrun ) if ( -e next.seqno ) then $chm dyn.inp D:$d > dyn.out else $chm dynstrt.inp D:$d > dyn.out endif set okay = true # TEST FOR EXISTENCE, THEN NONZERO LENGTH OF OUTPUT FILES if ( -e dyn.res && -e dyn.dcd ) then @ res = `wc dyn.res | awk '{print $1}'` @ tsz = `ls -s dyn.dcd | awk '{print $1}'` @ nrm = `grep ' NORMAL TERMINATION ' dyn.out | wc -l` if ( $res > 100 && $tsz > 0 && $nrm == 1 ) then # SUCCESSFUL RUN; COPY RESTART FILE cp dyn.res dyn.rea # DETERMINE RUN NUMBER if ( -e next.seqno ) then @ i = `cat next.seqno` else @ i = 1 endif # NUMBER AND MOVE THE OUTPUT FILES mv dyn.out Out/dyn$i.out mv dyn.crd Crd/dyn$i.crd mv dyn.res Res/dyn$i.res mv dyn.dcd Trj/dyn$i.dcd gzip -f Out/dyn$i.out Res/dyn$i.res Crd/dyn$i.crd # CONDITIONAL END CHECK if ( -e last.seqno ) then @ l = `cat last.seqno` if ( $i == $l ) then @ i += 1 echo $i > next.seqno exit endif endif @ i += 1 echo $i > next.seqno else # ZERO LENGTH FILE(S) set okay = false endif else # FILE DOESN'T EXIST set okay = false endif # TEST FOR CHARMM RUN FAILED; CREATE .ERR FILE WITH TIMESTAMP if ( $okay == true ) then # SUBMIT THE NEXT JOB if ( $krun == $nrun ) ./sbatch.csh @ krun += 1 else set ts = `date +%m%d.%H%M` date > msg.tmp echo $cwd >> msg.tmp head -64 dyn.out >> msg.tmp tail -64 dyn.out >> msg.tmp mv dyn.out dyn.err.$ts mail -s "$SLURM_JOB_NAME $SLURM_JOB_ID $ts" $USER@helix.nih.gov < msg.tmp exit(201) endif end
The above can be used for most parallel CHARMM usage (except for GPUs); for some types of calculations, the async keyword may be needed as well.
Plain text versions of the scripts are given in the following links; after downloading, rename them to .csh files, and allow them to be executed via e.g.
mv sbatch.txt sbatch.csh chmod u+x sbatch.csh
The plot below is from CHARMM benchmarks run during the beta test phase of Biowulf, and shows results for DOMDEC on Infiniband nodes (solid lines) and DOMDEC_GPU on gpu nodes (dotted lines), for even numbers from 2 through 16 nodes. The timings in ns/day are from short MD simulations, run with a 1 fs integration time step, for 3 molecular systems of different sizes and shapes:
Note that the ns/day rate would be doubled with the use of a 2 fs time step, which is often done for more exploratory sampling, but not necessarily recommended for the best accuracy and precision. Simulations systems that cannot use DOMDEC will be somewhat slower, and will not scale well past about 64 cores.