High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
TRUP on NIH HPC Systems

Cancer cells express many rearranged transcripts posing increased complexity to transcriptome analysis. As an unified pipeline, TRUP is designed to sensitively and accurately dissect the complexity of the cancer transcriptome by analyzing RNA-seq data obtained from tumour tissues. The current functionalities of TRUP include: 1) identification of fusion transcripts; 3) RNA-seq quality assesment; 2) Gene-read counting. The fusion detection module in TRUP combines split-read/read-pair mapping with regional de-novo assembly to achieve a balance between sensitivity and precision.

On Helix

Sample session:


[susanc@helix ~]$ module load trup
[susanc@helix ~]$ $ perl RTrace.pl --help

# RTrace.pl --help

GENERAL OPTIONS (MUST SET):
        --runlevel      the steps of runlevel, from 1-8, either rl1-rl2 or rl. See below for options for each runlevel.
        --sampleName    the name of the lane needed to be processed (must set for runlevel 1-5)
        --seqType       set to 's' if it is a single-end sequencing experiment, or 'p' for paired-end (default).
        --runID         the ID of the run needed to be processed (default not set, must set if fastq files are ended with _R1_00X.fq)
        --merge         a comma-separated list of runIDs that are needed to be merged, the output will be written to the defined runID
        --root          the root directory of the pipeline (default is $bin/../PIPELINE/, MUST set using other dir)
        --anno          the annotations directory (default is $bin/../ANNOTATION/, MUST set using other dir)
        --species       specify the reference version of the species, such as hg19 (default), mm10.
        --patient       the patient id, which will be written into the target file for edgeR
        --tissue        the tissue type name (like 'normal', 'cancer'), for the target file for running edgeR and cuffdiff
        --Rbinary       the name of R executable, default is 'R'. Set if your R binary name is different.
.......
.......

Batch job on Biowulf

Create a batch input file (e.g. myjob.sh). For example:

#!/bin/bash
module load trup

cd /data/$USER/dir
trup command
......

Then submit the file on biowulf

sbatch myjob.sh
Swarm of Jobs on Biowulf

Create a swarmfile (e.g. myjob.swarm). For example:

# this file is called pyclone.swarm
cd dir1;trup command
cd dir2;trup command
cd dir3;trup command
[...]

Submit this job using the swarm command.

swarm -f myjob.swarm --module trup

Interactive job on Biowulf
Allocate an interactive session and run raremetal. Sample session:
[susanc@biowulf ~]$ sinteractive 
salloc.exe: Pending job allocation 15194042
salloc.exe: job 15194042 queued and waiting for resources
salloc.exe: job 15194042 has been allocated resources
salloc.exe: Granted job allocation 15194042
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1719 are ready for job

[susanc@cn1719 ~]$ module load trup

[susanc@cn1719 ~]$ trup command
Documentation