High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
RNA-SeQC on Biowulf & Helix

RNA-SeQC is a java program which computes a series of quality control metrics for RNA-seq data. The input can be one or moreĀ BAM files. The output consists of HTML reports and tab delimited files of metrics data. This program can be valuable for comparing sequencing quality across different samples or experiments to evaluate different experimental parameters. It can also be run on individual samples as a means of quality control before continuing with downstream analysis.

Running on Helix

$ module load rnaseqc
$ cd /data/$USER/dir
$ java -Xmx4g -jar $RNASEQCPATH/RNA-SeQC_v$VERSION.jar -r Ref.fa -s SampleFile -t file.gtf -o outdir

* replace $VERSION with the application version, For example, 1.1.8

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash

module load rnaseqc
cd /data/$USER/dir
java -Xmx4g -jar $RNASEQCPATH/RNA-SeQC_v$VERSION.jar -r Ref.fa -s SampleFile -t file.gtf -o outdir

2. Submit the script on biowulf:

$ sbatch --mem=4g jobscript
--mem: allocate memory in Mg form. The number should match -Xmx4g in the script

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; java -Xmx4g -jar $RNASEQCPATH/RNA-SeQC_v$VERSION.jar -r Ref.fa -s SampleFile -t file.gtf -o outdir
  cd /data/$USER/dir2; java -Xmx4g -jar $RNASEQCPATH/RNA-SeQC_v$VERSION.jar -r Ref.fa -s SampleFile -t file.gtf -o outdir
  cd /data/$USER/dir3; java -Xmx4g -jar $RNASEQCPATH/RNA-SeQC_v$VERSION.jar -r Ref.fa -s SampleFile -t file.gtf -o outdir
	[......]
  

Submit the swarm file:

  $ swarm -g 4 -f swarmfile --module rnaseqc

-g: memory needed; this number should match -Xmx4g in the swarm file
-f: specify the swarmfile name
--module: load the required module for each command line in the file

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive --mem=4g
salloc.exe: Granted job allocation 16535

cn999$ module load rnaseqc
cn999$ cd /data/$USER/dir
cn999$ java -Xmx4g -jar $RNASEQCPATH/RNA-SeQC_v$VERSION.jar -r Ref.fa -s SampleFile -t file.gtf -o outdir
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem. remember to change -Xmx10g in your command

biowulf$ sinteractive --mem=10g

Documentation

http://www.broadinstitute.org/cancer/cga/rnaseqc_run