High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Miso on Biowulf & Helix

MISO (Mixture-of-Isoforms) is a probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples. By modeling the generative process by which reads are produced from isoforms in RNA-Seq, the MISO model uses Bayesian inference to compute the probability that a read originated from a particular isoform.

The MISO framework is described in Katz et. al., Analysis and design of RNA sequencing experiments for identifying isoform regulationNature Methods (2010).

Miso Setting File

The default miso setting files can be copied from /usr/local/apps/miso/miso_settings.txt and modified. It looks like this:

[data]
filter_results = True
min_event_reads = 20
[cluster]
cluster_command = sbatch --mem=10g --time=50:00:00
[sampler]
burn_in = 500
lag = 10
num_iters = 5000
num_chains = 6
num_processors = 2  

Miso can be run in two modes: multi-threaded mode and parallel mode. Each mode use/ignore different directives in the miso setting file.

Multi-threaded mode will be used when running miso without '--use-cluster' flag.
The line 'num_processors' will be used and 'cluster_command" will be ignored. The default threads are 2.

Parallel mode will be used when running miso with '--use-cluster' and '--chunk-jobs' flags on biowulf. In this mode, depending on the number assigned to '--chunk-jobs=#', many jobs will be submitted to the cluster and each job will use one thread no matter what is assigned to 'num_processors' line since this line will be ignored. Events will be split into # events per batch and each batch will be submitted to a job. The smaller the # is, the more number of jobs will be created. The default memory is 10gb and 50 hours of walltime for each job. If more memory or walltime is needed, copy/modify the setting file, include '--settings-filename=.....' flag in miso command, then submit miso job using sbatch.

If '--settings-filename=.....' flag is not specified in miso command, the default miso setting file will be used.

Running on Helix
$ module load miso
$ cd /data/$USER/dir
$ miso --run ./indexed ./accepted_hits.bam --output-dir miso_out --read-len 76

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash
module load miso
cd /data/$USER/dir
miso --run ./indexed ./accepted_hits.bam --output-dir miso_outdir --read-len 76

2. Submit the script on biowulf:

$ sbatch jobscript

Since '--setting-file' is not specified in the miso command, the default setting file will be used and 4gb, 2 threads, and 4 hours of walltime will be applied to this job.

For more memory requireme, use --mem flag
To set walltime, use --time=HH:MM:SS
For example, to request 10gb of memory and 50 hours of walltime:

$ sbatch --mem=10g --time=50:00:00 jobscript

For more threads, for example, to run miso on 30 threads (default 2 threads), copy/change the line to num_processors=30 in the miso_settings.txt
Modify the command line in the script to include the path of the setting file using '--setttings-filename' flag:

miso --run ./indexed ./accepted_hits.bam --output-dir miso_outdir 
    --read-len 76 --settings-filename=/data/$USER/....../miso_settings.txt

Then submit the script adding --cpus-per-task=30 flag

$ sbatch --cpus-per-task=30 --mem=10g --time=50:00:00 script

Note: replacing '30' with $SLURM_CPUS_PER_TASK in miso setting file will not work. Miso doesn't recognize this variable.
The 'cluster_command' line in the setting file will be ignored since this job is not running in cluster mode.

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; miso --run ./indexed ./accepted_hits.bam --output-dir miso_out --read-len 76
  cd /data/$USER/dir2; miso --run ./indexed ./accepted_hits.bam --output-dir miso_out --read-len 76
  cd /data/$USER/dir3; miso --run ./indexed ./accepted_hits.bam --output-dir miso_out --read-len 76
	[......]

Submit the swarm file:

  $ swarm -f swarmfile --module miso

-f: specify the swarmfile name
--module: set environmental variables for each command line in the file

To allocate more memory, use -g flag:

  $ swarm -f swarmfile -g 10 --module miso

To allocate multiple threads for each swarm command, copy/modify miso_settings.txt 'num_threads' line (num_threads=8, for example), modify the miso command to include --setting-filename=FullPathToSettingFile, then submit the swarm job using '-t' flag.

  $ swarm -f swarmfile -g 10 -t 8 --time 50:00:00 --module miso

For more information regarding running swarm, see swarm.html

Running an Interactive Job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load miso
cn999$ cd /data/$USER/dir
cn999$ miso --run ./indexed ./accepted_hits.bam --output-dir miso_out --read-len 76

cn999$ exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem flag. For more walltime use --time flag. For more threads use --cpus-per-task. For example

biowulf$ sinteractive --mem=10g --time=50:00:00 --cpus-per-task=8

Submit a Parallel Job

Refer to above section regarding miso setting file.

1. Create a script file similar to the lines below.

#!/bin/bash
module load miso
cd /data/$USER/dir
miso --run ./indexed ./accepted_hits.bam --output-dir miso_out_cluster --read-len 76 \
--use-cluster --chunk-jobs=1000

2. Submit the script on biowulf:

$ sbatch jobscript

In this example, events will be split into multiple chunks, each chunk contains 1000 events, and each chunk will be submitted as a single job to the cluster. So the less events are assigned to each chunk, the faster the job will be finished. By default each job will run on 1 core and 10gb or memory. The default walltime is 50 hours.

For more memory or walltime, copy and modify the miso_setting.txt file and add '--settings-filename=/data/$USER/..../miso_settings_cluster.txt' to the miso command.

Benchmarks

The following benchmarks used 14gb bam file against fly genome:

2 threads (default) multi-threaded 44 hours
12 threads multi-threaded 29 hours
24 threads multi-threaded 15 hours
1000 events/chunk cluster mode, 15 jobs 15 hours
100 events/chunk cluster mode, 150 jobs 3.5 hours

Documentation

http://miso.readthedocs.org/en/fastmiso/