Model-based Analysis of ChIP-Seq (MACS) is used on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms and can be used for ChIP-Seq with or without control samples.
$MACS_TEST_DATA
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=4g --gres=lscratch:5 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load macs [+] Loading macs 2.2.6 [user@cn3144 ~]$ macs2 usage: macs2 [-h] [--version] {callpeak,bdgpeakcall,bdgbroadcall,bdgcmp,bdgopt,cmbreps,bdgdiff,filterdup,predictd,pileup,randsample,refinepeak} ... [user@cn3144 ~]$ cp $MACS_TEST_DATA/*.bam . [user@cn3144 ~]$ ls -lh -rw-rw-r-- 1 user group 387M Feb 14 10:19 ENCFF001NGB.bam -rw-rw-r-- 1 user group 458M Feb 14 10:19 ENCFF001NHS.bam [user@cn3144 ~]$ # ENCFF001NHS.bam is the control (IgG) data [user@cn3144 ~]$ macs2 callpeak \ -t ENCFF001NGB.bam -c ENCFF001NHS.bam \ -n test -g mm INFO @ Wed, 14 Feb 2018 09:47:20: # Command line: callpeak -t /usr/local/apps/macs/TEST_DATA/ENCFF001NGB.bam -c /usr/local/apps/macs/TEST_DATA/ENCFF001NH S.bam -n test -g mm # ARGUMENTS LIST: # name = test # format = AUTO # ChIP-seq file = ['/usr/local/apps/macs/TEST_DATA/ENCFF001NGB.bam'] # control file = ['/usr/local/apps/macs/TEST_DATA/ENCFF001NHS.bam'] # effective genome size = 1.87e+09 # band width = 300 # model fold = [5, 50] # qvalue cutoff = 5.00e-02 # Larger dataset will be scaled towards smaller dataset. # Range for calculating regional lambda is: 1000 bps and 10000 bps # Broad region calling is off # Paired-End mode is off INFO @ Wed, 14 Feb 2018 09:47:20: #1 read tag files... [...snip...] [user@cn3144 ~]$ module load macs/3 [user@cn3144 ~]$ macs3 -h usage: macs3 [-h] [--version] {callpeak,bdgpeakcall,bdgbroadcall,bdgcmp,bdgopt,cmbreps,bdgdiff,filterdup,predictd,pileup,randsample,refinepeak,callvar,hmmratac} ... macs3 -- Model-based Analysis for ChIP-Sequencing positional arguments: {callpeak,bdgpeakcall,bdgbroadcall,bdgcmp,bdgopt,cmbreps,bdgdiff,filterdup,predictd,pileup,randsample,refinepeak,callvar,hmmratac} callpeak Main MACS3 Function: Call peaks from alignment results. bdgpeakcall Call peaks from bedGraph file. bdgbroadcall Call nested broad peaks from bedGraph file. bdgcmp Comparing two signal tracks in bedGraph format. bdgopt Operate the score column of bedGraph file. cmbreps Combine bedGraph files of scores from replicates. bdgdiff Differential peak detection based on paired four bedGraph files. filterdup Remove duplicate reads, then save in BED/BEDPE format file. predictd Predict d or fragment size from alignment results. In case of PE data, report the average insertion/fragment size from all pairs. pileup Pileup aligned reads (single-end) or fragments (paired-end). randsample Randomly choose a number/percentage of total reads, then save in BED/BEDPE format file. refinepeak Take raw reads alignment, refine peak summits. Inspired by SPP. callvar Call variants in given peak regions from the alignment BAM files. hmmratac Dedicated peak calling based on Hidden Markov Model for ATAC-seq data. optional arguments: -h, --help show this help message and exit --version show program's version number and exit For command line options of each command, type: macs3 COMMAND -h [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. macs.sh), which uses the input file 'macs.in'. For example:
#! /bin/bash module load macs || exit 1 macs2 callpeak \ -t treatment1.bam \ -c control.bam \ --call-summits \ --tempdir /lscratch/$SLURM_JOB_ID \ --qvalue 0.01 \ -n treatment1 -g mm
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=2 --mem=10g --gres=lscratch:20 macs.sh
Create a swarmfile (e.g. macs.swarm). For example:
macs2 callpeak -t treatment1.bam -c control.bam --name treatment1 macs2 callpeak -t treatment2.bam -c control.bam --name treatment2 macs2 callpeak -t treatment3.bam -c control.bam --name treatment3
Submit this job using the swarm command.
swarm -f macs.swarm -g 10 --module macs --gres=lscratch:10where
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module macs | Loads the macs module for each subjob in the swarm |