High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Mats on Biowulf & Helix

MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data. The statistical model of MATS calculates the P-value and false discovery rate that the difference in the isoform ratio of a gene between two conditions exceeds a given user-defined threshold. From the RNA-Seq data, MATS can automatically detect and analyze alternative splicing events corresponding to all major types of alternative splicing patterns. MATS handles replicate RNA-Seq data from both paired and unpaired study design.

Examples can be copied from /usr/local/apps/mats/testData, e.g.

$ cp -r /usr/local/apps/mats/testData /data/$USER/dir
$ cp -r /usr/local/apps/mats/testRun.sh /data/$USER/dir

MATS-NIH

The MATS-NIH tool produces the same results as the MATS tool while making use of the the multithreading capabilities that exit on Biowulf nodes. MATS-NIH was developed by George Zaki (NCI). To use MATS-NIH, use module load mats-nih.

Note that the multi-threading in MATS-NIH will mean that the job uses more memory. You should request memory corresponding to the number of threads. If your job dies unexpectedly, you should check the memory usage with

jobhist jobnumber
and increase the memory allocation if required.

The test data can be copied from /usr/local/apps/mats/testData for both MATS and MATS-NIH.

Running on Helix

Sample session:

$ module load mats
$ cd /data/$USER/dir

$ testRun.sh /fdb/STAR/iGenomes/Homo_sapiens/UCSC/hg19
Testing rMATS with BAM input files
========== SE ========
Junction Counts Only: There are 25 AS events. Of these, 1 events are statistically significant
1 significant events have higher inclusion level for SAMPLE_1 and 0 events for SAMPLE_2
Junction Counts and Reads on target Exon Counts: There are 35 AS events. Of these, 1 events are statistically significant
1 significant events have higher inclusion level for SAMPLE_1 and 0 events for SAMPLE_2
========== MXE ========
Junction Counts Only: There are 13 AS events. Of these, 0 events are statistically significant
0 significant events have higher inclusion level for SAMPLE_1 and 0 events for SAMPLE_2
Junction Counts and Reads on target Exon Counts: There are 20 AS events. Of these, 0 events are statistically significant
0 significant events have higher inclusion level for SAMPLE_1 and 0 events for SAMPLE_2
========== A5SS ========
Junction Counts Only: There are 2 AS events. Of these, 0 events are statistically significant
0 significant events have higher inclusion level for SAMPLE_1 and 0 events for SAMPLE_2
Junction Counts and Reads on target Exon Counts: There are 5 AS events. Of these, 0 events are statistically significant
0 significant events have higher inclusion level for SAMPLE_1 and 0 events for SAMPLE_2
========== A3SS ========
Junction Counts Only: There are 6 AS events. Of these, 0 events are statistically significant
0 significant events have higher inclusion level for SAMPLE_1 and 0 events for SAMPLE_2
Junction Counts and Reads on target Exon Counts: There are 9 AS events. Of these, 0 events are statistically significant
0 significant events have higher inclusion level for SAMPLE_1 and 0 events for SAMPLE_2
========== RI ========
Junction Counts Only: There are 12 AS events. Of these, 0 events are statistically significant
0 significant events have higher inclusion level for SAMPLE_1 and 0 events for SAMPLE_2
Junction Counts and Reads on target Exon Counts: There are 27 AS events. Of these, 0 events are statistically significant
0 significant events have higher inclusion level for SAMPLE_1 and 0 events for SAMPLE_2

#####
Please delete sam files in /data/$USER/mats/bam_test/SAMPLE_*/REP_*/ folder to save storage space!
#####

Testing MATS with FASTQ input files..
This step involves mapping to GTF and could take up to an hour..

Testing MATS finished..  

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash

module load mats
cd /data/$USER/dir
python $MATSPATH/RNASeq-MATS.py ...

2. Submit the script on biowulf:

$ sbatch jobscript
For more memory requirement (default 4gb), use --mem flag:
$ sbatch --mem=10g jobscript

To utilize multiple CPUS with MATS-NIH, add the --cpus-per-task option while submitting your script. e.g.

#!/bin/bash

module load mats-nih
cd /data/$USER/dir
python $MATSPATH/RNASeq-MATS.py ...
Submit with:
sbatch --mem=10g --cpus-per-task=32 jobscript

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; python $MATSPATH/RNASeq-MATS.py ...
  cd /data/$USER/dir2; python $MATSPATH/RNASeq-MATS.py ...
  cd /data/$USER/dir3; python $MATSPATH/RNASeq-MATS.py ...
	[......]
  

Submit the swarm file:

$ swarm -f swarmfile --module mats

-f: specify the swarmfile name
--module: load the required module for each command line in the file

To allocate more memory, use -g flag:

  $ swarm -f swarmfile -g 20 --module mats

For more information regarding running swarm, see swarm.html

To run a swarm of MATS-NIH jobs, you would submit with:

  swarm -f swarmfile --module mats-nih -t 32
  
where -t 32: number of threads that _each_ Mats process (i.e. one line in your swarm command file) should run
--module mats-nih: module to be loaded for each swarm job
[-g #] : optional, GigaBytes of memory that each Mats process requires. Required if you need more memory than the default.

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load mats
cn999$ cd /data/$USER/dir
cn999$ python $MATSPATH/RNASeq-MATS.py ...
[...etc...]

cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem. For example

biowulf$ sinteractive --mem=10g

Documentation

http://rnaseq-mats.sourceforge.net/