High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
TSSpredator on Biowulf & Helix

Description

TSSpredator uses RNA-seq data to predict transcription start sites (TSSs) across species and/or conditions. TSSpredator input consists of a genome or a multiple sequence alignment of genomes, RNA-seq data in wiggle format separated by strand, and annotation information. In the case of multiple genomes, a single coordinate system is used based on the genome alignment.

TSSpredator has two modes: In a GUI mode an analysis can be set up and run. Alternatively, the GUI can be used to just create a config file which can then be run from the command line. A config file could also be created automatically by some other process.

References

Environment variables set

Web sites

On Helix

The preferred way to run TSSpredator, is via a wrapper script:

helix$ module load TSSpredator
helix$ TSSpredator -h
usage: TSSpredator [configfile]

This is a wrapper for TSSpredator. It sets some default values for the jvm.

positional arguments:
  configfile         If provided, TSSpredator runs the analysis defined in the
                     config file. Otherwise, the GUI is started that can be
                     used to configure a run (default: None)

optional arguments:
  -h, --help         show this help message and exit
  -m MEM, --mem MEM  maximal heap memory allowed for the Java process.
                     *Format*: [kmg]. *Examples*: 1024k, 1000m, 2g,
                     32g. Passed to Java with -Xmx (default: 2g)
  -n, --dry-run

The wrapper script sets some reasonable defaults, automatically creates a temp dir, checks the environment, and cleans up temporary files when done. If local scratch is allocated, the wrapper will default to using it.

However, TSSpredator can also be run as described in the manual by directly invoking java and passing the TSSpredator jar file as well as options limiting memory and setting the temp file path. The path to the jar file is set by the environment module:

helix$ module load TSSpredator
helix$ echo $TSSPREDATOR_JAR
/usr/local/apps/TSSpredator/1-04/bin/TSSpredator.jar
helix$ java -Djava.io.tmpdir=<tmpdir> -Xmx<mem>g -jar $TSSPREDATOR_JAR

Where the tmpdir should either point to a unique directory in /scratch or, if running on a compute node, it should be /lscratch/${SLURM_JOBID}. The appropriate location has to be selected according to situation and cleaned up after the run.

TSSpredator has two ways to run. If no config file is provided it will start a GUI which can be used to configure a run. In order to use a GUI based program on helix, connect to helix via one of the methods that allows X11 programs to display on your local machine. Detailed descriptions for different systems and methods are described on our documentation pages. Then start TSSpredator

helix$ TSSpredator

Which will start the following GUIi (after values have been filled in):

TSSpredator GUI

The fields can be filled in as shown above by loading the test config file /usr/local/apps/TSSpredator/TEST_DATA/test.conf. Note that the ouput path is invalid. If you would like to run the example data, please change the output path to an existing directory, save the config file under a different name and then either hit Run or exit and run via the following command:

helix$ TSSpredator --mem=4G /path/to/the/modified/file.conf

A step by step walk through for creating a config file based on the example data can be found in /usr/local/apps/TSSpredator/TEST_DATA/HowTo.txt.

Helix is not meant for large scale analysis work. For anything more than toy examples, please use helix or biowulf only for creating configuration files and run the actual analysis on a compute node.

Batch job on Biowulf

In order to run a batch job, a configuration file for an analysis has to be created with either the TSSpredator GUI or by some other process. Then a script like the following can be used to submit a batch job:

#! /bin/bash
#SBATCH --mem=8G
set -e

module load TSSpredator || exit 66
TSSpredator --mem=7G /some/file.conf

And the batch job is submitted as usual with

b2$ sbatch -c4 tsspredator.batch
2594384
Swarm of jobs on Biowulf

In order to run swarm jobs, configuration files for all analyses have to be created with either the TSSpredator GUI or by some other process. Then a swarm file like the following can be set up:

TSSpredator --mem=7G /path/to/file1.conf
TSSpredator --mem=7G /path/to/file2.conf
TSSpredator --mem=7G /path/to/file3.conf

And submitted like this (this time using local scratch space for temp files):

b2$ swarm --gres=lscratch:10G -g 8 --module TSSpredator/1-04 -f swarmfile
Interactive job on Biowulf

In order to run the GUI on an interactive node, connect to biowulf using one of the methods that allows X11 applications to display locally (see above), then allocate an interactive session with X11 forwarding:

b2$ sinteractive --x11
salloc.exe: Granted job allocation 2611750
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0615 are ready for job
cn0615$ module load TSSpredator
cn0615$ TSSpredator
[...starts the GUI...]
cn0615$ exit
b2$

Once the interactive session is allocated, use as described above for helix.

Documentation