A Hidden Markov ModeleR for ATAC-seq. HMMRATAC splits a single ATAC-seq dataset into nucleosome-free and nucleosome-enriched signals, learns the unique chromatin structure around accessible regions, and then predicts accessible regions across the entire genome.
Allocate an interactive session and run the program.
Sample session (user input in bold):
First, use samtools to prepare a sample.
[user@biowulf ~]$ sinteractive --cpus-per-task=4 --mem=12G --gres=lscratch:10 salloc.exe: Pending job allocation 41216741 salloc.exe: job 41216741 queued and waiting for resources salloc.exe: job 41216741 has been allocated resources salloc.exe: Granted job allocation 41216741 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3125 are ready for job srun: error: x11: no local DISPLAY defined, skipping [user@cn3125 ~]$ cd /data/$USER/HMMRATAC/testdata [user@cn3125 testdata]$ ls ATACseq.bam [user@cn3125 testdata]$ module load samtools [+] Loading samtools 1.9 ... [user@cn3125 testdata]$ samtools sort -@4 -m 1800M -T /lscratch/$SLURM_JOB_ID/ATACseq.bam \ -o ATACseq.sorted.bam ATACseq.bam [bam_sort_core] merging from 0 files and 4 in-memory blocks... [user@cn3125 testdata]$ samtools index -@4 ATACseq.sorted.bam ATACseq.sorted.bam.bai [user@cn3125 testdata]$ samtools view -H ATACseq.sorted.bam | \ perl -ne 'if(/^@SQ.*?SN:(\w+)\s+LN:(\d+)/){print $1,"\t",$2,"\n"}' > \ genome.info
Then use HMMRATAC to analyze the sample. (Note that these --upper and --lower option/argument pairs may not be realistic.)
[user@cn3125 testdata]$ module load HMMRATAC [+] Loading HMMRATAC 1.2.9 on cn3125 [+] Loading java 12.0.1 ... [user@cn3125 testdata]$ java -jar $HMMRATAC_HOME/HMMRATAC.jar --upper 100 --lower 2 \ --bam ATACseq.sorted.bam \ --index ATACseq.sorted.bam.bai \ --genome genome.info
Create a batch input file (e.g. HMMRATAC.sh). For example:
#!/bin/bash #SBATCH --job-name="HMMRATAC-test" #SBATCH --mem=12g #SBATCH --ntasks=1 #SBATCH --partition=quick #SBATCH --time=2:0:0 #SBATCH --cpus-per-task=4 #SBATCH --error=/data/$USER/HMMRATAC-test.e #SBATCH --output=/data/$USER/HMMRATAC-test.o module load HMMRATAC cd /data/$USER/HMMRATAC/testdata java -jar $HMMRATAC_HOME/HMMRATAC.jar \ -b ATACseq.sorted.bam \ -i ATACseq.sorted.bam.bai \ -g genome.info
Submit this job using the Slurm sbatch command.
[user@biowulf ~]$ sbatch HMMRATAC.sh
Create a swarmfile (e.g. HMMRATAC.swarm). For example:
java -jar $HMMRATAC_HOME/HMMRATAC.jar -b 1.sorted.bam -i 1.sorted.bam.bai -g 1.info java -jar $HMMRATAC_HOME/HMMRATAC.jar -b 2.sorted.bam -i 2.sorted.bam.bai -g 2.info java -jar $HMMRATAC_HOME/HMMRATAC.jar -b 3.sorted.bam -i 3.sorted.bam.bai -g 3.info java -jar $HMMRATAC_HOME/HMMRATAC.jar -b 4.sorted.bam -i 4.sorted.bam.bai -g 4.info
Submit this job using the swarm command.
[user@biowulf ~]$ swarm -f HMMRATAC.swarm [-g #] [-t #] --module HMMRATACwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module HMMRATAC | Loads the HMMRATAC module for each subjob in the swarm |