High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Gem on Biowulf & Helix

Description

GEM: High resolution peak calling and motif discovery for ChIP-seq and ChIP-exo data

Citation:
High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. Yuchun Guo, Shaun Mahony & David K Gifford, (2012) PLoS Computational Biology 8(8): e1002638.

GEM is a scientific software for studying protein-DNA interaction at high resolution using ChIP-seq/ChIP-exo data. It can also be applied to CLIP-seq and Branch-seq data.
GEM links binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence, resolves ChIP data into explanatory motifs and binding events at unsurpassed spatial resolution. GEM reciprocally improves motif discovery using binding event locations, and binding event predictions using discovered motifs.

GEM has following features:

There may be multiple versions available on our systems. An easy way of selecting the version is to use modules. To see the modules available, type

module avail gem 

To select a module use

module load gem/[version]

where [version] is the version of choice.

Environment variables set

 

Test/Configuration

- make sure use --t flag to specify thread number otherwise the job may overload the node.

- following examples used test files which can be copied from /usr/local/apps/gem/gps_test

node $ module load gem

node $ java -Xmx10g -jar $GEMJAR --t 24 \
--d ./Read_Distribution_default.txt \
--g ./mm10.chrom.sizes
--genome /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/ \
--s 2000000000
--expt SRX000540_mES_CTCF.bed \
--ctrl SRX000543_mES_GFP.bed \
--f BED \
--out mouseCTCF --k_min 6 --k_max 13

The test job took about 10 min

 

Documentation

http://groups.csail.mit.edu/cgs/gem/
Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described below

biowulf$ sinteractive --mem=12g --cpus-per-task=24
salloc.exe: Pending job allocation 38978697
[...snip...]
salloc.exe: Nodes cn2273 are ready for job
node$ module load gem
[+] Loading gem
node$ # Copy the test data
node$ cp -r /usr/local/apps/gem/gps_test .
node$ 
node$ java -Xmx10g -jar $GEMJAR --t 24 \
      --d ./Read_Distribution_default.txt \
      --g ./mm10.chrom.sizes 
      --genome /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/ \
      --s 2000000000 \
      --expt ./SRX000540_mES_CTCF.bed \
      --ctrl ./SRX000543_mES_GFP.bed \
      --f BED \
      --out mouseCTCF --k_min 6 --k_max 13
[...snip...]
node$ exit
biowulf$

 

Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is gem.batch

module load gem || exit 1
cp -r /usr/local/apps/gem/gps_test /data/$USER/dir
cd /data/$USER/dir
java -Xmx10g -jar $GEMJAR --t 24 \
      --d ./Read_Distribution_default.txt \
      --g ./mm10.chrom.sizes \
      --genome /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/ \
      --s 2000000000 \
      --expt SRX000540_mES_CTCF.bed \
      --ctrl SRX000543_mES_GFP.bed \
      --f BED \
      --out mouseCTCF --k_min 6 --k_max 13
    

Submit to the queue with sbatch:

biowulf$ sbatch gem.batch
Swarm of Jobs on Biowulf

Create a swarmfile (e.g. script.swarm). For example:

# this file is called script.swarm
cd dir1;gem command 1;gem command 2
cd dir2;gem command 1;gem command 2
cd dir3;gem command 1;gem command 2
[...]

Submit this job using the swarm command.

swarm -f script.swarm --module gem -t 24

the above command assume the job use 24 thread when running gem as in example above. For more information regarding swarm: https://hpc.nih.gov/apps/swarm.html#usage