High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
MACS on Biowulf and Helix

Model-based Analysis of ChIP-Seq (MACS) is used on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms and can be used for ChIP-Seq with or without control samples.

MACS is written by Yong Zhang and Tao Liu from Xiaole Shirley Liu's Lab.

 

Running on Helix

Sample session:

helix$ module load macs

helix$ macs2 [-h] [--version] \
      {callpeak,diffpeak,bdgpeakcall,bdgbroadcall,bdgcmp,bdgdiff,filterdup,predictd,pileup,randsample,refinepeak}\

 

Submitting a single batch job

Sample file FoxA1_ChIP-seq.tar.gz can be copied from

/usr/local/apps/macs/FoxA1_ChIP-seq.tar.gz

1. Copy the sample file into your own directory.:

$ mkdir /data/$USER/macs/run1
$ cd /data/$USER/macs/run1 
$ cp /usr/local/apps/macs/FoxA1_ChIP-seq.tar.gz .

2. Create a script file alone the following lines:

#!/bin/bash 

cd /data/$USER/mydir
module load macs
macs2 callpeak -t Treatment_tags.bed -c Input_tags.bed --name test

3. Submit the script on Biowulf.

$ sbatch --mem=10g  myscript

--mem: memory required in gb. Default is 4gb

Submitting a swarm of jobs

1. Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

macs2 callpeak -t file1.bed -c Input_tags.bed --name test1
macs2 callpeak -t file2.bed -c Input_tags.bed --name test2
macs2 callpeak -t file3.bed -c Input_tags.bed --name test3
macs2 callpeak -t file4.bed -c Input_tags.bed --name test4
......

Submit this swarm with

$ swarm -f cmdfile --module macs

-f: name of the swarm file
--module: setup environmental variables for each macs job.

$ swarm -g 4 -f cmdfile --module macs

-g: memory required in gb.

For more information regarding running swarm, see swarm.html

 

Running an interactive job on Biowulf

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf]$ sinteractive 
      salloc.exe: Granted job allocation 1528

[user@pXXXX]$ cd /data/$USER/myruns

[user@pXXXX]$ module load macs

[user@pXXXX]$ macs2 callpeak -t Treatment_tags.bed -c Input_tags.bed --name test

[user@pXXXX] exit

[user@biowulf]$ 

Documentation

https://github.com/taoliu/MACS/blob/master/README.rst