Biowulf High Performance Computing at the NIH
ChIPseeqer: a comprehensive framework for the analysis of ChIP-seq data

ChIPseeqer is an integrative, comprehensive, fast and user-friendly computational framework for in-depth analysis of ChIP-seq datasets. It combinse several computational tools in order to create easily customized workflows that can be adapted to the user’s needs and objectives.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=4g
[user@@cn3200 ~]$module load ChIPseeqer 
[+] Loading gcc 4.8.5  ...
[+] Loading boost libraries v 5.10.1  ...
[+] Loading ChIPseeqer  2.1
[user@biowulf]$ ChIPseeqerAnnotate
...
ChIPseeqerAnnotate
        --peakfile=FILE File with ChIP-seq peaks.
        --lenuP=INT     Define the length upstream of TSS. Default is 2000bp.
        --lendP=INT     Define the length downstream of TSS. Default is 2000bp.
        --lenuDW=INT    Define the length upstream of TES. Default is 2000bp.
        --lendDW=INT    Define the length downstream of TES. Default is 2000bp.
        --genome=STR    hg19 (human)
                        hg18 (human)
                        mm10 (mouse)
                        mm9 (mouse)
                        rn4 (rat)
                        dm3 (drosophila)
                        sacser (Saccharomyces cerevisiae)
                        zv9 (zebrafish)
        --db=STR        refSeq (available for hg19, hg18, mm10, mm9, rn4, dm3, zv9)
                        AceView (for hg19, hg18, mm9)
                        Ensembl (for hg19, hg18, mm10, mm9, rn4, dm3, zv9)
                        UCSCGenes (for hg19, hg18, mm10, mm9).
                        Default is refSeq.
        --mindistaway=INT  Define minimum distance away from transcripts, used to define the distal regions. Default is 2000bp.
        --maxdistal=INT Define maximum distance away from transcripts, used to define the distal regions. Default is 50000kb.
        --verbose=INT   Verbose mode. Default is 0.
Dowload data to current folder:
[user@@cn3200 ~]$ cp $CS_DATA/* data 
Reset the environment variable to make it point to your local data folder:
[user@@cn3200 ~]$export CS_DATA=./data
The data folder contains a sample peaks file test_peaks.txt. To run the executable ChIPseeqerAnnotate on this file, type:
[user@@cn3200 ~]$ ChIPseeqerAnnotate --peakfile=./data/test_peaks.txt --genome=hg19  
Annotation files=/usr/local/apps/ChIPseeqer/2.1/src/dist/DATA/hg19/refSeq.new
Extracting [2000 - TSS - 2000] promoters ... Done.
Extracting [2000 - TES - 2000] downstream extremities ... Done.

Looking for distal peaks ... Looking for peaks that are > 2000 bp away from any refSeq genes
Extracting extended gene bodies ... Done.
Found 0 peaks within extended gene bodies (in test_peaks.txt.refSeq.GENEPEAKS), and 8 distant peaks (test_peaks.txt.refSeq.DISTPEAKS).

Looking for 2 closest genes ... Running FindClosestGene, with minimum distance 0...... Done (test_peaks.txt.refSeq.DISTPEAKS.refSeq.CLOSEST_NM.txt created).
Created test_peaks.txt.refSeq.DISTPEAKS.refSeq.GENEWITHPEAKS.txt
Converting RefSeq NM identifiers to ORFs......Done (test_peaks.txt.refSeq.DISTPEAKS.refSeq.CLOSEST_ORF.txt created).
Determining overlap between gene parts and ChIP-seq peaks ... Done (test_peaks.txt.refSeq.GP created).
Creating stats file ... Done (test_peaks.txt.refSeq.GP.stats created).
Creating frac file test_peaks.txt.refSeq.GP.frac .. Done
Creating transcript file with number of P, E, and I peaks for each transcript ... Done (test_peaks.txt.refSeq.GP.genes created).
Done (test_peaks.txt.refSeq.GP.genes.annotated created).
Creating list of peaks in promoters ... test_peaks.txt.refSeq.GP.promoters
Creating list of peaks in downstream extremities ... test_peaks.txt.refSeq.GP.DOWNEXTR
Creating list of peaks in exons ... test_peaks.txt.refSeq.GP.exons
Creating list of peaks in introns ... test_peaks.txt.refSeq.GP.introns
Creating list of peaks in introns 1 ... test_peaks.txt.refSeq.GP.introns1
Creating list of peaks in introns 2 ... test_peaks.txt.refSeq.GP.introns2
Creating list of peaks in distal regions (>2000 and <50000) ... test_peaks.txt.refSeq.GP.distal
Creating list of peaks in intergenic regions (>50000) ... test_peaks.txt.refSeq.GP.intergenic
Number of peaks:        8
Number of peaks that overlap with [-2000;2000] PROMOTERS:       0        (%0.0)
Number of peaks that overlap with [-2000;2000] DOWNSTREAM EXTREMITIES:  0        (%0.0)
Number of peaks that overlap with EXONS:        0        (%0.0)
Number of peaks that overlap with INTRONS:      0        (%0.0)
Number of peaks that overlap with DISTAL (>2000 and <50000):    0        (%0.0)
Number of peaks that overlap with INTERGENIC (>50000):  0        (%0.0)
End the interactive session:
[user@cn3200 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. rcorrector.sh). For example:

#!/bin/bash
#SBATCH --mem=4g
module load ChIPseeqer           
run_rcorrector.pl -1 $RCORRECTOR_DATA/sample_read1.fq -2 $RCORRECTOR_DATA/sample_read2.fq

Submit this job using the Slurm sbatch command.

sbatch rcorrector.sh 
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. rcorrector.swarm). For example:

#!/bin/bash
cd /scratch/$USER
run_rcorrector.pl -1 $RCORRECTOR_DATA/sample_read1.fq -2 $RCORRECTOR_DATA/sample_read2.fq

Submit this job using the swarm command.

swarm -f rcorrector.swarm  -g 4