High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Deeptools on Biowulf & Helix

deepTools is a suite of user-friendly tools for the visualization, quality control and normalization of data from high-throughput DNA sequencing experiments.

deepTools offers multiple methods for highly-customizable data visualization that immensely aid hypothesis generation and data interpretation. It also offers all the tools needed to create coverage files in standard bedGraph and bigWig file formats allowing various normalization procedures and comparisons between two files (for example, treatment and control).

Programs auto thread to all available processors. So make sure -p flag is specified.

Running on Helix

$ module load deeptools
$ cd /data/$USER/dir
$ bamCoverage -b file.bam -o outfile -of bigwig -p 4

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash

module load deeptools
cd /data/$USER/dir
bamCoverage -b file.bam -o outfile -of bigwig -p $SLURM_CPUS_PER_TASK

2. Submit the script on biowulf:

$ sbatch --mem=10g --cpus-per-task=4 jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; bamCoverage -b file.bam -o outfile -of bigwig -p $SLURM_CPUS_PER_TASK
  cd /data/$USER/dir2; bamCoverage -b file.bam -o outfile -of bigwig -p $SLURM_CPUS_PER_TASK
  cd /data/$USER/dir3; bamCoverage -b file.bam -o outfile -of bigwig -p $SLURM_CPUS_PER_TASK
	[......]

Submit the swarm file:

  $ swarm -f swarmfile -t 4 -g 10 --module deeptools

-f: specify the swarmfile name
-t: specify the thread number
-g: specify the memory in gb
--module: set environmental variables for each command line in the file

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive --mem=10g --cpus-per-task=4
salloc.exe: Granted job allocation 16535

cn999$ module load deeptools
cn999$ cd /data/$USER
cn999$  bamCoverage -b file.bam -o outfile -of bigwig -p $SLURM_CPUS_PER_TASK

cn999$ exit


biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem flag. For example

biowulf$ sinteractive --mem=20g

If more threads are needed, use --cpus-per-task flag. For example

biowulf$ sinteractive --cpus-per-task=6

Documentation

https://github.com/fidelram/deepTools/wiki