High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
MapSplice on Biowulf & Helix

MapSplice is a software for mapping RNA-seq data to reference genome for splice junction discovery that depends only on reference genome, and not on any further annotations.

MapSplice was developed at U. Kentucky. [MapSplice homepage]

Running MapSplice on Helix

$ module load mapsplice

$  module list

Currently Loaded Modules:
  1) mapsplice/2.1.8

$  cd /data/$USER/dir

$  python $MSHOME/mapsplice.py [options] -c <Reference_Sequence> -x <Bowtie_Index> -1 <Read_List1> -2 <Read_List2>

-----------------------------------------------
[Tue Aug 19 14:21:12 2014] Beginning Mapsplice run (MapSplice v2.1.8)
[Tue Aug 19 14:21:12 2014] Bin directory: /spin1/sys2/usrlocal/apps/mapsplice/MapSplice-v2.1.8/bin/ 
[Tue Aug 19 14:21:12 2014] Preparing output location mapsplice_out/
[Tue Aug 19 14:21:12 2014] Checking files or directory: Sp_ds.10k.left.fq
[Tue Aug 19 14:21:12 2014] Checking files or directory: Sp_ds.10k.right.fq
[Tue Aug 19 14:21:12 2014] Checking files or directory: ./
[Tue Aug 19 14:21:12 2014] Checking Bowtie index files
[Tue Aug 19 14:21:12 2014] Building Bowtie index for reference sequence
[Tue Aug 19 14:25:29 2014] Inspecting Bowtie index files
[Tue Aug 19 14:25:30 2014] Checking reference sequence length
[Tue Aug 19 14:25:30 2014] Checking consistency of Bowtie index and reference sequence
[Tue Aug 19 14:25:30 2014] Checking read format
-----[Read Format: FASTQ]
-----[Read Type: Pair End]
-----[Total # Reads: 20000]
-----[Max Read Length: 68]
-----[Min Read Length: 68]
-----[Max Quality Score: 71]
-----[Min Quality Score: 35]
-----[Quality Score Scale: Phred+33]
[Tue Aug 19 14:25:30 2014] Running MapSplice multi-thread
[Tue Aug 19 14:25:39 2014] Generating junctions from sam file
[Tue Aug 19 14:25:39 2014] Filtering junction by min mis and min lpq
[Tue Aug 19 14:25:39 2014] Filtering junction by ROC argu noncanonical
Waring: No original junctions found, skip build index step
[Tue Aug 19 14:25:39 2014] Running MapSplice multi-thread
[Tue Aug 19 14:25:46 2014] Converting unmapped reads to sam
[Tue Aug 19 14:25:46 2014] Converting unmapped reads to sam
[Tue Aug 19 14:25:46 2014] Running alignment handler
[Tue Aug 19 14:25:48 2014] Sorting file
[Tue Aug 19 14:25:48 2014] Setting unmapped paired end reads bit flag
[Tue Aug 19 14:25:49 2014] Formatting SAM file
[Tue Aug 19 14:25:49 2014] Collecting stats of read alignments and junctions

[Tue Aug 19 14:25:49 2014] Mapsplice finished running (time used: 0:04:36.581115)

Batch job on Biowulf

Set up a batch script along the following lines.

#!/bin/bash 

cd /data/$USER/mydir
module load mapsplice

python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 Sp_ds.10k.left.fq \
    -2 Sp_ds.10k.right.fq -p $SLURM_CPUS_PER_TASK

Submit to the batch system with:

$ sbatch --cpus-per-task=8 jobscript

The number assigned to '--cpus-per-task' will be assigned to $SLURM_CPUS_PER_TASK in the script automatically. If more memory is needed, use --mem=Mg :

$ sbatch --cpus-per-task=8 --mem=20g jobscript

 

Swarm of jobs on Biowulf

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

cd /data/user/mydir1; python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 seq1.fq -2 seq2.fq -p $SLURM_CPUS_PER_TASK
cd /data/user/mydir2; python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 seq1.fq -2 seq2.fq -p $SLURM_CPUS_PER_TASK
cd /data/user/mydir2; python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 seq1.fq -2 seq2.fq -p $SLURM_CPUS_PER_TASK
   [...]   

Submit this job with

swarm -t 8 -f cmdfile --module mapsplice

Interactive job on Biowulf

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf]$ sinteractive -c 4 
      salloc.exe: Granted job allocation 1528

[user@pXXX]$ cd /data/user/myruns

[user@pXXX]$ module load mapsplice

[user@pXXX]$ cd /data/userID/mapsplice/run1

[user@pXXX]$ python $MSHOME/mapsplice.py -Q fq -o output_path -u file1.fastq -c /fdb/genome/hg19/chr_all.fa -b /usr/local/bowtie-indexes --threads $SLURM_CPUS_PER_TASK -L 18 2 > output.log

[user@pXXX] exit

[user@biowulf]$ 

Documentation

http://www.netlab.uky.edu/p/bioinfo/MapSpliceManual