Mapsplice on Biowulf

Quick Links

MapSplice is a software for mapping RNA-seq data to reference genome for splice junction discovery that depends only on reference genome, and not on any further annotations.

References:

Wang et al.,MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Research (2010). Link

Documentation

Important Notes

Module Name: mapsplice (see the modules page for more information)
Multithreaded app
environment variables set
- $MSHOME

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive -c 4
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load mapsplice

[user@cn3144 ~]$ python $MSHOME/mapsplice.py -Q fq -o output_path -u file1.fastq -c /fdb/genome/hg19/chr_all.fa -b /usr/local/bowtie-indexes --threads $SLURM_CPUS_PER_TASK -L 18 2 > output.log
-----------------------------------------------
[Tue Aug 19 14:21:12 2014] Beginning Mapsplice run (MapSplice v2.1.8)
[Tue Aug 19 14:21:12 2014] Bin directory: /spin1/sys2/usrlocal/apps/mapsplice/MapSplice-v2.1.8/bin/ 
[Tue Aug 19 14:21:12 2014] Preparing output location mapsplice_out/
[Tue Aug 19 14:21:12 2014] Checking files or directory: Sp_ds.10k.left.fq
[Tue Aug 19 14:21:12 2014] Checking files or directory: Sp_ds.10k.right.fq
[Tue Aug 19 14:21:12 2014] Checking files or directory: ./
[Tue Aug 19 14:21:12 2014] Checking Bowtie index files
[Tue Aug 19 14:21:12 2014] Building Bowtie index for reference sequence
[Tue Aug 19 14:25:29 2014] Inspecting Bowtie index files
[Tue Aug 19 14:25:30 2014] Checking reference sequence length
[Tue Aug 19 14:25:30 2014] Checking consistency of Bowtie index and reference sequence
[Tue Aug 19 14:25:30 2014] Checking read format
-----[Read Format: FASTQ]
-----[Read Type: Pair End]
-----[Total # Reads: 20000]
-----[Max Read Length: 68]
-----[Min Read Length: 68]
-----[Max Quality Score: 71]
-----[Min Quality Score: 35]
-----[Quality Score Scale: Phred+33]
[Tue Aug 19 14:25:30 2014] Running MapSplice multi-thread
[Tue Aug 19 14:25:39 2014] Generating junctions from sam file
[Tue Aug 19 14:25:39 2014] Filtering junction by min mis and min lpq
[Tue Aug 19 14:25:39 2014] Filtering junction by ROC argu noncanonical
Waring: No original junctions found, skip build index step
[Tue Aug 19 14:25:39 2014] Running MapSplice multi-thread
[Tue Aug 19 14:25:46 2014] Converting unmapped reads to sam
[Tue Aug 19 14:25:46 2014] Converting unmapped reads to sam
[Tue Aug 19 14:25:46 2014] Running alignment handler
[Tue Aug 19 14:25:48 2014] Sorting file
[Tue Aug 19 14:25:48 2014] Setting unmapped paired end reads bit flag
[Tue Aug 19 14:25:49 2014] Formatting SAM file
[Tue Aug 19 14:25:49 2014] Collecting stats of read alignments and junctions

[Tue Aug 19 14:25:49 2014] Mapsplice finished running (time used: 0:04:36.581115)
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. mapsplice.sh). For example:

#!/bin/bash

cd /data/$USER/mydir
module load mapsplice

python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 Sp_ds.10k.left.fq \
    -2 Sp_ds.10k.right.fq -p $SLURM_CPUS_PER_TASK

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=8 --mem=10g mapsplice.sh

Swarm of Jobs

A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. mapsplice.swarm). For example:

cd /data/user/mydir1; python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 seq1.fq -2 seq2.fq -p $SLURM_CPUS_PER_TASK
cd /data/user/mydir2; python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 seq1.fq -2 seq2.fq -p $SLURM_CPUS_PER_TASK
cd /data/user/mydir2; python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 seq1.fq -2 seq2.fq -p $SLURM_CPUS_PER_TASK
   [...]

Submit this job using the swarm command.

swarm -f mapsplice.swarm -g 10 -t 8 --module mapsplice

where

`-g #`	Number of Gigabytes of memory required for each process (1 line in the swarm command file)
`-t #`	Number of threads/CPUs required for each process (1 line in the swarm command file).
`--module mapsplice`	Loads the mapsplice module for each subjob in the swarm