High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
USeq on Biowulf and Helix

USeq is a collection of software tools for for both low and high level analysis of next generation, ultra high throughput signature sequencing data from the Solexa, SOLiD, and 454 platforms. Initial emphasis: chIP-seq and RNA-Seq with FDR estimations. USeq is under continuous development at the Huntsman Cancer Institute in the Utah Bioinformatics Shared Resource Center.

USeq requires the R program. The appropriate paths can be set up with the 'module load useq' command. This module also sets an environment variable called $USEQ that points to the location of the USeq jar files, and $USEQDOCS that point to the documentation.

Submitting a single USeq batch job

1. Create a script file. The file will contain the lines similar to the lines below.

#!/bin/bash
# This file is runUseq
#

module load useq

cd /data/user/mydir
java -jar $USEQ/Gr2Bar -f /data/user/myseqs/ -v hg19

3. Submit the script using the 'sbatch' command on Biowulf:2

sbatch  runUseq
This will submit the job to 2 CPUs and 4 GB of memory.

Submitting a swarm of Useq jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

module load useq; cd /data/user/myseqs; java -jar $USEQ/SamParser -v hg19 -f /data/user/myseqs1
module load useq; cd /data/user/myseqs; java -jar $USEQ/SamParser -v hg19 -f /data/user/myseqs2
[...]

Submit this swarm with:

swarm -g # -f cmdfile
where '#' represents the amount of memory required by a single command in the file above. The USeq documentation provides some clues about memory usage. For example, the USeq ChIPSeq program requires 6-8 GB RAM. If you were running a swarm of ChIPSeq obs that each required 8 GB of memory, you would submit with:
swarm -g 8 -f cmdfile

For more information regarding running swarm, see swarm.html

 

On Helix or B2 interactive session

You can also run the GUI on Helix or on a Biowulf interactive session.

First log in to Helix or biowulf with an Xwindows connection. On Biowulf, request an interactive session with 'sinteractive --x11'

The first time you run the USeq GUI, you will need to create a directory called /home/username/.GWrap and link the USeq applications and documentation as in the example below:

If you have previously run Useq on Biowulf1, you should delete the ~/.GWrap directory and start afresh, as the pathnames and USeq version numbers are not the same on B2 as on B1.

$ sinteractive --x11
salloc.exe: Granted job allocation 1288507
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0151 are ready for job
[user@cn0151 ~]$

[user@cn0151 ~]$ cd /data/YourUserID

[user@cn0151 ~]$ module load useq

[user@cn0151 ~]$ mkdir ~/.GWrap
[user@cn0151 ~]$ ln -s $USEQDOCS ~/.GWrap/Apps
[user@cn0151 ~]$ ln -s $USEQDOCS ~/.GWrap/Documentation

[user@cn0151 ~]$ java -jar $USEQ/GWrap_GUI_ClickMe.jar 
Here findDirectories()
Looking for dirs in /home/user
Attempting to get them from webstart jar file.
Attempting to pull files using copyDocsApps()
src jar /usr/local/USeq_8.3.4/LibraryJars/bioToolsCodeLibrary.jar
Making prefs dialog
Adding Apps menu items /home/user/.GWrap/Apps
ABITraceTCPeakCalculator
AggregatePlotter
Alleler
AllelicMethylationDetector
[...etc...]

Documentation

http://useq.sourceforge.net/