Biowulf High Performance Computing at the NIH
Gem on HPC

GEM: High resolution peak calling and motif discovery for ChIP-seq and ChIP-exo data.

GEM is a scientific software for studying protein-DNA interaction at high resolution using ChIP-seq/ChIP-exo data. It can also be applied to CLIP-seq and Branch-seq data.
GEM links binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence, resolves ChIP data into explanatory motifs and binding events at unsurpassed spatial resolution. GEM reciprocally improves motif discovery using binding event locations, and binding event predictions using discovered motifs.

GEM has following features:

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load gem
[user@cn3144 ~]$ java -Xmx10g -jar $GEMJAR --t 8 --d ../3.0/Read_Distribution_default.txt \
--g ../3.0/mm10.chrom.sizes \
--genome /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/ \
--s 2000000000 --expt SRX000540_mES_CTCF.bed --ctrl SRX000543_mES_GFP.bed \
--f BED --out mouseCTCF --k_min 6 --k_max 13
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. batch.sh). For example:

#!/bin/bash
set -e
module load gem
java -Xmx10g -jar $GEMJAR --t 8 --d ../3.0/Read_Distribution_default.txt \
--g ../3.0/mm10.chrom.sizes \
--genome /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/ \
--s 2000000000 --expt SRX000540_mES_CTCF.bed --ctrl SRX000543_mES_GFP.bed \
--f BED --out mouseCTCF --k_min 6 --k_max 13

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=4 --mem=10g batch.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. job.swarm). For example:

cd dir1; gem command
cd dir2; gem command
cd dir3; gem command

Submit this job using the swarm command.

swarm -f job.swarm -g 10 -t 4 --module gem
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module Loads the module for each subjob in the swarm