Biowulf High Performance Computing at the NIH
Danpos on Biowulf

A toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing


Important Notes

DANPOS is extensively documented. To read the help doc, type -h. You can also view information about specific commands. For instance, dpos -h returns detailed help about the dpos command. See also:

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ mkdir -p /data/$USER/danpos_test

[user@cn3144 ~]$ cd /data/$USER/danpos_test

[user@cn3144 ~]$ module load danpos
[+] Loading danpos 2.2.2 on
[+] Loading python 2.7.10 ...
[+] Loading rpy2 2.7.0 (R version 3.2.2) ...
[+] Loading samtools 1.3.1 ...

[user@cn3144 ~]$ -h
danpos version 2.2.2
For help information for each function, try:
python  -h

		analyze each protein-binding position (~100
		to ~200bp wide) across the whole genome,
		e.g. nucleosome positions.
		analyze each protein-binding peak (~1 to ~1kb
		wide) across the whole genome, e.g. protein
		that binds accruately to some specific motifs.
		analyze each protein-binding region (~1 to
		~10kb wide) across the whole genome, e.g.
		some histone modifications.
		Do analysis at all three levels including each
		region, peak, and position. Would be useful
		when little is known about the potential binding
		analyze wiggle format occupancy or differential
		signal profile relative to gene structures or
		bed format genomic regions.
		normalize wiggle files to have the same quantiles (Quantile normalization).
		convert wiggle format file to wiq format.
		some statistics for positions, peaks, or regions.
		select a subset of positions, peaks, or regions
		by value ranges or gene structures neighboring.
		retrieve position, peak, or region values by ranks.

Kaifu Chen, et al., Li lab, Biostatistics department, Dan L. Duncan cancer center, Baylor College of Medicine.

[user@cn3144 ~]$ TEST_DATA=/usr/local/apps/danpos/2.2.2/testdata

[user@cn3144 ~]$ dpos $TEST_DATA/nucleosome_sampleA_rep1.bowtie > danpos_dpos.log

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:

module load danpos
cd /data/$USER/output_dir
TEST_DATA=/usr/local/apps/danpos/2.2.2/testdata dpos $TEST_DATA/nucleosome_sampleA_rep1.bowtie dpos $TEST_DATA/nucleosome_sampleA_rep1.bowtie:$TEST_DATA/nucleosome_sampleB.bowtie

Submit this job using the Slurm sbatch command.

Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. danpos.swarm). For example: dpos /dir/to/data/rep1.bowtie dpos /dir/to/data/rep2.bowtie dpos /dir/to/data/rep3.bowtie 
.... dpos /dir/to/data/repN.bowtie 

Submit this job using the swarm command.

swarm -f danpos.swarm --module danpos
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module danpos Loads the danpos module for each subjob in the swarm