Biowulf High Performance Computing at the NIH
Danpos on Biowulf

A toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing

References:

Documentation
Important Notes

DANPOS is extensively documented. To read the help doc, type danpos.py -h. You can also view information about specific commands. For instance, danpos.py dpos -h returns detailed help about the dpos command. See also:

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ mkdir -p /data/$USER/danpos_test

[user@cn3144 ~]$ cd /data/$USER/danpos_test

[user@cn3144 ~]$ module load danpos
[+] Loading danpos 2.2.2 on biowulf.nih.gov
[+] Loading python 2.7.10 ...
[+] Loading rpy2 2.7.0 (R version 3.2.2) ...
[+] Loading samtools 1.3.1 ...

[user@cn3144 ~]$ danpos.py -h
danpos version 2.2.2
For help information for each function, try:
python danpos.py  -h

Functions:
	dpos:
		analyze each protein-binding position (~100
		to ~200bp wide) across the whole genome,
		e.g. nucleosome positions.
	dpeak:
		analyze each protein-binding peak (~1 to ~1kb
		wide) across the whole genome, e.g. protein
		that binds accruately to some specific motifs.
	dregion:
		analyze each protein-binding region (~1 to
		~10kb wide) across the whole genome, e.g.
		some histone modifications.
	dtriple:
		Do analysis at all three levels including each
		region, peak, and position. Would be useful
		when little is known about the potential binding
		pattern.
	profile:
		analyze wiggle format occupancy or differential
		signal profile relative to gene structures or
		bed format genomic regions.
	wiq:
		normalize wiggle files to have the same quantiles (Quantile normalization).
	wig2wiq:
		convert wiggle format file to wiq format.
	stat:
		some statistics for positions, peaks, or regions.
	selector:
		select a subset of positions, peaks, or regions
		by value ranges or gene structures neighboring.
	valuesAtRanks:
		retrieve position, peak, or region values by ranks.

Kaifu Chen, et al. chenkaifu@gmail.com, Li lab, Biostatistics department, Dan L. Duncan cancer center, Baylor College of Medicine.

[user@cn3144 ~]$ TEST_DATA=/usr/local/apps/danpos/2.2.2/testdata

[user@cn3144 ~]$ danpos.py dpos $TEST_DATA/nucleosome_sampleA_rep1.bowtie > danpos_dpos.log

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. danpos.sh). For example:

#!/bin/bash
module load danpos
cd /data/$USER/output_dir
TEST_DATA=/usr/local/apps/danpos/2.2.2/testdata
danpos.py dpos $TEST_DATA/nucleosome_sampleA_rep1.bowtie
danpos.py dpos $TEST_DATA/nucleosome_sampleA_rep1.bowtie:$TEST_DATA/nucleosome_sampleB.bowtie

Submit this job using the Slurm sbatch command.

sbatch danpos.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. danpos.swarm). For example:

danpos.py dpos /dir/to/data/rep1.bowtie 
danpos.py dpos /dir/to/data/rep2.bowtie 
danpos.py dpos /dir/to/data/rep3.bowtie 
....
danpos.py dpos /dir/to/data/repN.bowtie 

Submit this job using the swarm command.

swarm -f danpos.swarm --module danpos
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module danpos Loads the danpos module for each subjob in the swarm