Merlin on HPC

Peakachu on HPC

A supervised learning framework for chromatin loop detection in genome-wide contact maps.

Documentation

https://github.com/tariks/peakachu

Important Notes

Module Name: peakachu (see the modules page for more information)
Test data files: $PEAKACHU_DATA

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive -c 4 --mem=8g --gres=lscratch:20
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load peakachu
[user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144 ~]$ cp -r $PEAKACHU_DATA .
[user@cn3144 ~]$ time peakachu train -r 10000 -p TEST_DATA/Rao2014-GM12878-MboI-allreps-filtered.10kb.cool --balance -O models -b TEST_DATA/gm12878.mumbach.h3k27ac-hichip.hg19.bedpe
collecting from chr1
collecting from chr2
collecting from chr3
collecting from chr4
...
[CV] END class_weight=None, criterion=gini, max_depth=25, max_features=sqrt, n_estimators=100, n_jobs=1; total time=   7.9s
{'class_weight': None, 'criterion': 'gini', 'max_depth': 25, 'max_features': 'sqrt', 'n_estimators': 100, 'n_jobs': 1}
0.8398682330848078

real    24m49.558s
user    33m29.951s
sys     0m50.692s
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. batch.sh). For example:

#!/bin/bash
set -e
module load peakachu
peakachu train -r 10000 -p data.cool --balance -O models -b data.bedpe

Submit this job using the Slurm sbatch command.

sbatch batch.sh

Swarm of Jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. job.swarm). For example:

cd dir1; peakachu train -r 10000 -p data.cool --balance -O models -b data1.bedpe
cd dir2; peakachu train -r 10000 -p data.cool --balance -O models -b data2.bedpe
cd dir3; peakachu train -r 10000 -p data.cool --balance -O models -b data3.bedpe

Submit this job using the swarm command.

swarm -f job.swarm --module peakachu