High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
REDItools on Biowulf & Helix

REDItools are python scripts developed with the aim to study RNA editing at genomic scale by next generation sequencing data. RNA editing is a post-transcriptional phenomenon involving the insertion/deletion or substitution of specific bases in precise RNA localizations. In human, RNA editing occurs by deamination of cytosine to uridine (C-to-U) or mostly by the adenosine to inosine (A-to-I) conversion through ADAR enzymes. A-to-I substitutions may have profound functional consequences and have been linked to a variety of human diseases including neurological and neurodegenerative disorders or cancer. Next generation sequencing technologies offer the unique opportunity to investigate in depth RNA editing even though no dedicated software has been released up to now. REDItools are simple python scripts conceived to facilitate the investigation of RNA editing at large-scale and devoted to research groups that would to explore such phenomenon in own data but don’t have sufficient bioinformatics skills. They work on main operating systems, can handle reads from whatever platform in the standard BAM format and implement a varietyof filters.

REDItools enable the analysis of RNA editing at three levels:

REDItools include also accessory scripts:

Example files can be copied from /usr/local/apps/reditools/testREDItools.tar.gz

Running on Helix

$ module load reditools
$ cp /usr/local/apps/reditools/testREDItools.tar.gz /data/$USER
$ tar xvfz testREDItools.tar.gz ; cd testREDItools
$ REDItoolDnaRna.py -i rna.bam -j dna.bam -f reference.fa -o outdir

Some binaries can be run multi-threaded using -t flag. Do not use more than 4 threads on helix.

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash

module load reditools
cd /data/$USER/dir
REDItoolDnaRna.py -i rna.bam -j dna.bam -f reference.fa -o outdir -t $SLURM_CPUS_PER_TASK

2. Submit the script on biowulf. The value assigned to '--cpus-per-task' will be passed to '$SLURM_CPUS_PER_TASK' in the script:

$ sbatch --cpus-per-task=4 jobscript

For more memory requirement (default 2xcpus=8gb in this case), use --mem flag:

$ sbatch --cpus-per-task=4 --mem=10g jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; REDItoolDnaRna.py -i rna.bam -j dna.bam -f reference.fa -o outdir -t $SLURM_CPUS_PER_TASK
  cd /data/$USER/dir2; REDItoolDnaRna.py -i rna.bam -j dna.bam -f reference.fa -o outdir -t $SLURM_CPUS_PER_TASK
  cd /data/$USER/dir3; REDItoolDnaRna.py -i rna.bam -j dna.bam -f reference.fa -o outdir -t $SLURM_CPUS_PER_TASK
	[......]

Submit the swarm file:

  $ swarm -f swarmfile -t 4 --module reditools

-f: specify the swarmfile name
-t: specify the thread number
--module: set environmental variables for each command line in the file

To allocate more memory, use -g flag:

  $ swarm -f swarmfile -t 4 -g 10 --module reditools

-g: allocate more memory

For more information regarding running swarm, see swarm.html

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive --cpus-per-task=4
salloc.exe: Granted job allocation 16535

cn999$ module load reditools
cn999$ cd /data/$USER/dir
cn999$ REDItoolDnaRna.py -i rna.bam -j dna.bam -f reference.fa -o outdir -t $SLURM_CPUS_PER_TASK

cn999$ exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem flag. For example

biowulf$ sinteractive --cpus-per-task=4 --mem=10g

Documentation

http://150.145.82.212/ernesto/reditools/doc/