Biowulf High Performance Computing at the NIH
Chipseq Pipeline on Biowulf

Description: The AQUAS pipeline is based off the ENCODE (phase-3) transcription factor and histone ChIP-seq pipeline specifications (by Anshul Kundaje). However, please note that this is NOT the official ENCODE (phase-3) pipeline but rather a free and open-source implementation that adheres to the specifications.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive -c 16 --mem=20g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ cd /data/$USER/chipseq/

[user@cn3144 ~]$ module load chipseq_pipeline

[user@cn3144 ~]$ chipseq_pipelines 
# This prints the options in the pipeline

[user@cn3144 ~]$ chipseq_pipelines -bam1 sample1.bam \
				-bam2 sample2.bam \
				-ctl_bam1 control1.bam \
				-ctl_bam2 control2.bam \
				-chrsz /fdb/chipseq_pipeline/hg19/hg19.chrom.sizes \
				-gensz hs \
				-rm_chr_from_tag "_" \
				-wt_spp 2h \
				-no_spp \
                                -nth $SLURM_CPUS_PER_TASK


[user@cn3144 ~]$ exit

These commands were tested using actual user data. No sample data available.

salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. Chipseq.sh). For example:

#!/bin/bash
set -e
module load chipseq_pipeline
chipseq_pipelines -bam1 sample1.bam \
				-bam2 sample2.bam \
				-ctl_bam1 control1.bam \
				-ctl_bam2 control2.bam \
				-chrsz /fdb/chipseq_pipeline/hg19/hg19.chrom.sizes \
				-gensz hs \
				-rm_chr_from_tag "_" \
				-wt_spp 2h \
				-no_spp \
				-nth $SLURM_CPUS_PER_TASK

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=16] [--mem=20] Chipseq.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. Chipseq.swarm). For example:

chipseq_pipelines -bam1 sampleA_1.bam -bam2 sampleA_2.bam ... -nth $SLURM_CPUS_PER_TASK
chipseq_pipelines -bam1 sampleB_1.bam -bam2 sampleB_2.bam ... -nth $SLURM_CPUS_PER_TASK
chipseq_pipelines -bam1 sampleC_1.bam -bam2 sampleC_2.bam ... -nth $SLURM_CPUS_PER_TASK

Submit this job using the swarm command.

swarm -f Chipseq.swarm [-g #] [-t #] --module chipseq_pipeline
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module Chipseq Pipeline Loads the Chipseq Pipeline module for each subjob in the swarm