High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Bwtool on Biowulf
Description

bwtool is a command-line utility for bigWig files. bigWigs are an indexed and compressed form of wig file, a somewhat standard format for storing genome-wide real-valued signal data. Much of the ENCODE processed data is in this form, and it is appearing more often in GEO as well. The purpose of bwtool is to make these files more useful by providing some convenient functions.

bwtool's functionality is subdivided into subprograms that roughly fall into three categories: data extraction, analysis, and data modification, although e.g. in the case of the matrix program or the sax program, the boundary between data extraction and analysis isn't very strong. The data modification programs all have the behavior that a bigWig is inputted and a new bigWig is outputted.

Documentation
Important Notes

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=10g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load bwtool
[user@cn3144 ~]$ bwtool commands 

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. script.sh). For example:

#!/bin/bash
module load bwtool
bwtool commands

Submit this job using the Slurm sbatch command.

sbatch --mem=10g script.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit set of independent commands requiring identical resources.

Create a swarmfile (e.g. bwtool.swarm). For example:

cd /data/$USER/dir1; bwtool command1; bwtool command2
cd /data/$USER/dir2; bwtool command1; bwtool command2
cd /data/$USER/dir3; bwtool command1; bwtool command2

Submit this job using the swarm command.

swarm -f bwtool.swarm -g 10 --module bwtool
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
--module bwtool Loads the required module for each subjob in the swarm