High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
macs on Biowulf

Model-based Analysis of ChIP-Seq (MACS) is used on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms and can be used for ChIP-Seq with or without control samples.


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=4g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load macs
[+] Loading macs

[user@cn3144 ~]$ macs2
usage: macs2 [-h] [--version]

[user@cn3144 ~]$ cp $MACS_TEST_DATA/*.bam .
[user@cn3144 ~]$ ls -lh
-rw-rw-r-- 1 user group 387M Feb 14 10:19 ENCFF001NGB.bam
-rw-rw-r-- 1 user group 458M Feb 14 10:19 ENCFF001NHS.bam

[user@cn3144 ~]$ # ENCFF001NHS.bam is the control (IgG) data
[user@cn3144 ~]$ macs2 callpeaks \
    -t ENCFF001NGB.bam -c ENCFF001NHS.bam \
    -n test -g mm

INFO  @ Wed, 14 Feb 2018 09:47:20:
# Command line: callpeak -t /usr/local/apps/macs/TEST_DATA/ENCFF001NGB.bam -c /usr/local/apps/macs/TEST_DATA/ENCFF001NH
S.bam -n test -g mm
# name = test
# format = AUTO
# ChIP-seq file = ['/usr/local/apps/macs/TEST_DATA/ENCFF001NGB.bam']
# control file = ['/usr/local/apps/macs/TEST_DATA/ENCFF001NHS.bam']
# effective genome size = 1.87e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is off

INFO  @ Wed, 14 Feb 2018 09:47:20: #1 read tag files...

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. macs.sh), which uses the input file 'macs.in'. For example:

#! /bin/bash

module load macs/ || exit 1
macs2 callpeak \
    -t treatment1.bam \
    -c control.bam \
    --call-summits \
    --tempdir /lscratch/$SLURM_JOB_ID \
    --qvalue 0.01 \
    -n treatment1 -g mm

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=2 --mem=10g  --gres=lscratch:20 macs.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. macs.swarm). For example:

macs2 callpeak -t treatment1.bam -c control.bam --name treatment1
macs2 callpeak -t treatment2.bam -c control.bam --name treatment2
macs2 callpeak -t treatment3.bam -c control.bam --name treatment3

Submit this job using the swarm command.

swarm -f macs.swarm -g 10 --module macs/ --gres=lscratch:10
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module macs Loads the macs module for each subjob in the swarm