Biowulf High Performance Computing at the NIH
ABC on Biowulf

The Activity-by-Contact (ABC) model predicts which enhancers regulate which genes on a cell type specific basis.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --cpus-per-task=2 --mem=2G
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load ABC
[user@cn3144 ~]$ mkdir /data/$USER/ABC_test/
[user@cn3144 ~]$ cd /data/$USER/ABC_test/
[user@cn3144 ~]$ cp -r ${ABC_TEST_DATA}/* .
[user@cn3144 ~]$ python /opt/ABC-Enhancer-Gene-Prediction/src/makeCandidateRegions.py \
--narrowPeak example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
--bam example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
--outDir example_chr22/ABC_output/Peaks/ \
--chrom_sizes example_chr22/reference/chr22 \
--regions_blocklist reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed \
--regions_includelist example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed \
--peakExtendFromSummit 250 \
--nStrongestPeaks 3000

Running: bedtools sort -i example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed -faidx example_chr22/reference/chr22 | bedtools merge -i stdin -c 4 -o max | sort -nr -k 4 | head -n 3000 |bedtools intersect -b stdin -a example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted -wa |awk '{print $1 "\t" $2 + $10 "\t" $2 + $10}' |bedtools slop -i stdin -b 250 -g example_chr22/reference/chr22 |bedtools sort -i stdin -faidx example_chr22/reference/chr22 |bedtools merge -i stdin | bedtools intersect -v -wa -a stdin -b reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed | cut -f 1-3 | (bedtools intersect -a example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed -b example_chr22/reference/chr22.bed -wa | cut -f 1-3 && cat) |bedtools sort -i stdin -faidx example_chr22/reference/chr22 | bedtools merge -i stdin > example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.candidateRegions.bed
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. ABC.sh). For example:


#!/bin/bash
set -e
module load ABC
cd /data/$USER/ABC_test/
cp -r ${ABC_TEST_DATA}/* .
python /opt/ABC-Enhancer-Gene-Prediction/src/run.neighborhoods.py \
--candidate_enhancer_regions \
example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.candidateRegions.bed \
--genes example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.chr22.bed \
--H3K27ac example_chr22/input_data/Chromatin/ENCFF384ZZM.chr22.bam \
--DHS \
example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam,\
example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep2.chr22.bam \
--expression_table example_chr22/input_data/Expression/K562.ENCFF934YBO.TPM.txt \
--chrom_sizes example_chr22/reference/chr22 \
--ubiquitously_expressed_genes reference/UbiquitouslyExpressedGenesHG19.txt \
--cellType K562 \
--outdir example_chr22/ABC_output/Neighborhoods/ 


Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=2 --mem=2g ABC.sh