High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
ChromHMM on Biowulf & Helix

Description

ChromHMM can segment genomes into different chromatin states by modeling re-occuring combinatorial and spatial pattern of various histone modifications with a multivariate Hidden Markov Model. The resulting segmentations can be used to annotate genomes. Bed files for immediate visualization in genome browsers are generated.

ChromHMM automatically computes state enrichments for functional and annotation datasets (TSSs, exons, ...) which facilitates the biological characterization of each state.

There are multiple versions of ChromHMM available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail ChromHMM 

To select a module use

module load ChromHMM/[version]

where [version] is the version of choice.

ChromHMM LernModel can use more than one CPU. Make sure to match the number of cpus requested with the number used with the -p option.

ChromHMM can be run in two ways. The full path to the jarfile can be called like this:

java -mx4000M -Djava.awt.headless=true -jar $CHROMHMM_HOME/ChromHMM.jar [ command ] [ options ]
java -mx4000M -Djava.awt.headless=true -jar $CHROMHMM_JAR [ command ] [ options ]

The amount of memory is assigned with -mx[num], where 4000 MB is allocated. The second option -Djava.awt.headless=true is required, unless an X11 display is available. See here for more information about X11 display.

An easier way is to use the wrapper script ChromHMM.sh. This wrapper script includes an additional option to set the amount of memory:

ChromHMM.sh --memory 8g [ command ] [ options ]

By default, ChromHMM uses 4gb of memory. To allocate a different amount of memory, for example 20gb, include --memory 20g on the commandline.

Environment variables set

References

Web sites

On Helix
helix$ module load ChromHMM
helix$ cp -R /usr/local/apps/ChromHMM/TEST_DATA/SAMPLEDATA_HG18 .
helix$ ChromHMM.sh LearnModel SAMPLEDATA_HG18 OUTPUTSAMPLE 10 hg18

The output in OUTPUTSAMPLE includes webpage_10.html which contains a summary of the run. If an X11 connection exists it can be displayed with

helix$ firefox OUTPUTSAMPLE/webpage_10.html

Otherwise click on the HTML file via helixdrive or copy to your local workstation.

Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# This file is chromHMM.sh

function fail() {
    echo "$@" >&2
    exit 1
}
module load ChromHMM || fail "could not load module ChromHMM"
cp -R $CHROMHMM_HOME/SAMPLEDATA_HG18 .
ChromHMM.sh --memory 8g LearnModel -p ${SLURM_CPUS_PER_TASK} \
  SAMPLEDATA_HG18 OUTPUTSAMPLE 10 hg18
Submit to the queue with sbatch:
b2$ sbatch --cpus-per-task=8 --mem=8
Swarm of jobs on Biowulf

Create a swarm command file similar to the following example in which different numbers of states are evaluated in parallel:

ChromHMM.sh --memory 8g LearnModel -p ${SLURM_CPUS_PER_TASK} \
  SAMPLEDATA_HG18 OUTPUTSAMPLE8 8 hg18
ChromHMM.sh --memory 8g LearnModel -p ${SLURM_CPUS_PER_TASK} \
  SAMPLEDATA_HG18 OUTPUTSAMPLE10 10 hg18
ChromHMM.sh --memory 8g LearnModel -p ${SLURM_CPUS_PER_TASK} \
  SAMPLEDATA_HG18 OUTPUTSAMPLE12 12 hg18

And submit to the queue with swarm

b2$ swarm -f ChromHMM.swarm -t 8 -g 8 --module ChromHMM
Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described above

b2$ sinteractive --mem=8g --cpus-per-task=4
node$ module load ChromHMM
node$ ChromHMM.sh --memory 8g LearnModel -p 4 SAMPLEDATA_HG18 OUTPUTSAMPLE8 8 hg18
node$ exit
b2$