High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
EMAN2 on Biowulf

EMAN2 is the successor to EMAN1. It is a broadly based greyscale scientific image processing suite with a primary focus on processing data from transmission electron microscopes. EMAN's original purpose was performing single particle reconstructions (3-D volumetric models from 2-D cryo-EM images) at the highest possible resolution, but the suite now also offers support for single particle cryo-ET, and tools useful in many other subdisciplines such as helical reconstruction, 2-D crystallography and whole-cell tomography. Image processing in a suite like EMAN differs from consumer image processing packages like Photoshop in that pixels in images are represented as floating-point numbers rather than small (8-16 bit) integers. In addition, image compression is avoided entirely, and there is a focus on quantitative analysis rather than qualitative image display.

Author: Prof. Steve Ludtke

Reference: EMAN2: an extensible image processing suite for electron microscopy. (2007) Tang G, Peng L, Baldwin PR, Mann DS, Jiang W, Rees I, Ludtke SJ, J Struct Biol

EMAN2 requires some environment variables to be set. The simplest way is to use environment modules:

module load eman2
Submitting an EMAN2 batch job

Create a batch script along the following lines:

# set the environment properly
module load EMAN2/2.2

# always a good practice
export TMPDIR=/lscratch/${SLURM_JOB_ID}

# Run refinement.  Make sure to replace the input, output, and reference files,
# as well as any options needed.  This command is designed to run on 32 cpus
# threads each and storing temporary files in /lscratch/$SLURM_JOBID.

e2refine.py \
  --parallel=thread:${SLURM_CPUS_PER_TASK:=1}:/lscratch/${SLURM_JOB_ID} \
  --input=bdb:sets#set2-allgood_phase_flipped-hp \
  --mass=1200.0 \
  --apix=2.9 \
  --automask3d=0.7,24,9,9,24 \
  --iter=1 \
  --sym=c1 \
  --model=bdb:refine_02#threed_filt_05 \
  --path=refine_sge \
  --orientgen=eman:delta=3:inc_mirror=0 \
  --projector=standard \
  --simcmp=frc:snrweight=1:zeromask=1 \
  --simalign=rotate_translate_flip \
  --simaligncmp=ccc \
  --simralign=refine \
  --simraligncmp=frc:snrweight=1 \
  --twostage=2 \
  --classcmp=frc:snrweight=1:zeromask=1 \
  --classalign=rotate_translate_flip \
  --classaligncmp=ccc \
  --classralign=refine \
  --classraligncmp=frc:snrweight=1 \
  --classiter=1 \
  --classkeep=1.5 \
  --classnormproc=normalize.edgemean \
  --classaverager=ctf.auto \
  --sep=5 \
  --m3diter=2 \
  --m3dkeep=0.9 \
  --recon=fourier \
  --m3dpreprocess=normalize.edgemean \
  --m3dpostprocess=filter.lowpass.gauss:cutoff_freq=.1 \
  --pad=256 \
  --lowmem \
  --classkeepsig \
  --classrefsf \
  --m3dsetsf -v 2

e2bdb.py -cF

Then submit the job, allocating the appropriate number of processors and 50GB of scratch space.

$ sbatch --cpus-per-task=32 --ntasks=1  --gres=lscratch:50 EMAN2.sh

This job will utilize 32 cpus on a single node using multithreaded mode. Running in parallel mode requires local scratch as well.

EMAN2 can be run in parallel using MPI instead of multithreading. This is inherently less efficient than running multithreaded. However, it can increase the performance of EMAN2 if run on multiple nodes, especially if you have a ridiculously huge number of images or particles (> 500K).

Here is an example of an MPI job (e.g. EMAN2.sh):

module load EMAN2/2.2

# always a good practice
export TMPDIR=/lscratch/${SLURM_JOB_ID}

# Here is the command
e2refine_easy.py --input=starting.lst \
  --model=starting_models/model.hdf \
  --targetres=8.0 --speed=5 --sym=c1 \
  --tophat=local --mass=500.0 --apix=0.86 \
  --classkeep=0.5 --classautomask --prethreshold --m3dkeep=0.7 \
  --parallel=mpi:${SLURM_NTASKS:=1}:/lscratch/${SLURM_JOB_ID} \
  --threads ${SLURM_CPUS_PER_TASK:=1} \
  --automaskexpand=-1 --ampcorrect=auto

e2bdb.py -cF

Then submit, using the proper partition and allocating matching resources:

$ sbatch --partition=multinode --cpus-per-task=1 --ntasks=512 --gres=lscratch:100 --mem-per-cpu=4g --time=1-00:00:00 EMAN2.sh

MPI parallelization in EMAN2 is limited to no more than 1024 MPI tasks.

Graphical Interface for EMAN2

EMAN2 can be run using a graphical interface (GUI).

This application requires an X-Windows connection. It is known that XQuartz (v2.7.x) is incompatible with EMAN2. Users are encouraged to use NX or FastX as their X11 servers.

[biowulf]$ sinteractive --cpus-per-task=4 --mem=12g --gres=lscratch:50
[node]$ module load EMAN2
[node]$ e2projectmanager.py
project manager GUI
[node]$ e2display.py my_image.hdf