Biowulf High Performance Computing at the NIH
HMMRATAC on Biowulf

A Hidden Markov ModeleR for ATAC-seq. HMMRATAC splits a single ATAC-seq dataset into nucleosome-free and nucleosome-enriched signals, learns the unique chromatin structure around accessible regions, and then predicts accessible regions across the entire genome.

Reference:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

First, use samtools to prepare a sample.

[user@biowulf ~]$ sinteractive --cpus-per-task=4 --mem=12G --gres=lscratch:10
salloc.exe: Pending job allocation 41216741
salloc.exe: job 41216741 queued and waiting for resources
salloc.exe: job 41216741 has been allocated resources
salloc.exe: Granted job allocation 41216741
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3125 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

[user@cn3125 ~]$ cd /data/$USER/HMMRATAC/testdata

[user@cn3125 testdata]$ ls
ATACseq.bam

[user@cn3125 testdata]$ module load samtools
[+] Loading samtools 1.9  ...

[user@cn3125 testdata]$ samtools sort -@4 -m 1800M -T /lscratch/$SLURM_JOB_ID/ATACseq.bam \
    -o ATACseq.sorted.bam ATACseq.bam
[bam_sort_core] merging from 0 files and 4 in-memory blocks...

[user@cn3125 testdata]$ samtools index -@4 ATACseq.sorted.bam ATACseq.sorted.bam.bai

[user@cn3125 testdata]$ samtools view -H ATACseq.sorted.bam | \
    perl -ne 'if(/^@SQ.*?SN:(\w+)\s+LN:(\d+)/){print $1,"\t",$2,"\n"}' > \
    genome.info

Then use HMMRATAC to analyze the sample. (Note that these --upper and --lower option/argument pairs may not be realistic.)

[user@cn3125 testdata]$ module load HMMRATAC
[+] Loading HMMRATAC  1.2.9  on cn3125
[+] Loading java 12.0.1  ...

[user@cn3125 testdata]$ HMMRATAC_HOME/HMMRATAC.jar --upper 100 --lower 2 \
    --bam ATACseq.sorted.bam \
    --index ATACseq.sorted.bam.bai \
    --genome genome.info

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. HMMRATAC.sh). For example:

#!/bin/bash
#SBATCH --job-name="HMMRATAC-test"
#SBATCH --mem=12g
#SBATCH --ntasks=1
#SBATCH --partition=quick
#SBATCH --time=2:0:0
#SBATCH --cpus-per-task=4
#SBATCH --error=/data/$USER/HMMRATAC-test.e
#SBATCH --output=/data/$USER/HMMRATAC-test.o

module load HMMRATAC
cd /data/$USER/HMMRATAC/testdata

java -jar $HMMRATAC_HOME/HMMRATAC.jar \
    -b ATACseq.sorted.bam \
    -i ATACseq.sorted.bam.bai \
    -g genome.info

Submit this job using the Slurm sbatch command.

[user@biowulf ~]$ sbatch HMMRATAC.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. HMMRATAC.swarm). For example:

java -jar $HMMRATAC_HOME/HMMRATAC.jar -b 1.sorted.bam -i 1.sorted.bam.bai -g 1.info
java -jar $HMMRATAC_HOME/HMMRATAC.jar -b 2.sorted.bam -i 2.sorted.bam.bai -g 2.info
java -jar $HMMRATAC_HOME/HMMRATAC.jar -b 3.sorted.bam -i 3.sorted.bam.bai -g 3.info
java -jar $HMMRATAC_HOME/HMMRATAC.jar -b 4.sorted.bam -i 4.sorted.bam.bai -g 4.info

Submit this job using the swarm command.

[user@biowulf ~]$ swarm -f HMMRATAC.swarm [-g #] [-t #] --module HMMRATAC
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module HMMRATAC Loads the HMMRATAC module for each subjob in the swarm