Mapsplice on Biowulf
MapSplice is a software for mapping RNA-seq data to reference genome for splice junction discovery that depends only on reference genome, and not on any further annotations.
References:
- Wang et al.,MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Research (2010). Link
Documentation
Important Notes
- Module Name: mapsplice (see the modules page for more information)
- Multithreaded app
- environment variables set
- $MSHOME
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive -c 4 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load mapsplice [user@cn3144 ~]$ python $MSHOME/mapsplice.py -Q fq -o output_path -u file1.fastq -c /fdb/genome/hg19/chr_all.fa -b /usr/local/bowtie-indexes --threads $SLURM_CPUS_PER_TASK -L 18 2 > output.log ----------------------------------------------- [Tue Aug 19 14:21:12 2014] Beginning Mapsplice run (MapSplice v2.1.8) [Tue Aug 19 14:21:12 2014] Bin directory: /spin1/sys2/usrlocal/apps/mapsplice/MapSplice-v2.1.8/bin/ [Tue Aug 19 14:21:12 2014] Preparing output location mapsplice_out/ [Tue Aug 19 14:21:12 2014] Checking files or directory: Sp_ds.10k.left.fq [Tue Aug 19 14:21:12 2014] Checking files or directory: Sp_ds.10k.right.fq [Tue Aug 19 14:21:12 2014] Checking files or directory: ./ [Tue Aug 19 14:21:12 2014] Checking Bowtie index files [Tue Aug 19 14:21:12 2014] Building Bowtie index for reference sequence [Tue Aug 19 14:25:29 2014] Inspecting Bowtie index files [Tue Aug 19 14:25:30 2014] Checking reference sequence length [Tue Aug 19 14:25:30 2014] Checking consistency of Bowtie index and reference sequence [Tue Aug 19 14:25:30 2014] Checking read format -----[Read Format: FASTQ] -----[Read Type: Pair End] -----[Total # Reads: 20000] -----[Max Read Length: 68] -----[Min Read Length: 68] -----[Max Quality Score: 71] -----[Min Quality Score: 35] -----[Quality Score Scale: Phred+33] [Tue Aug 19 14:25:30 2014] Running MapSplice multi-thread [Tue Aug 19 14:25:39 2014] Generating junctions from sam file [Tue Aug 19 14:25:39 2014] Filtering junction by min mis and min lpq [Tue Aug 19 14:25:39 2014] Filtering junction by ROC argu noncanonical Waring: No original junctions found, skip build index step [Tue Aug 19 14:25:39 2014] Running MapSplice multi-thread [Tue Aug 19 14:25:46 2014] Converting unmapped reads to sam [Tue Aug 19 14:25:46 2014] Converting unmapped reads to sam [Tue Aug 19 14:25:46 2014] Running alignment handler [Tue Aug 19 14:25:48 2014] Sorting file [Tue Aug 19 14:25:48 2014] Setting unmapped paired end reads bit flag [Tue Aug 19 14:25:49 2014] Formatting SAM file [Tue Aug 19 14:25:49 2014] Collecting stats of read alignments and junctions [Tue Aug 19 14:25:49 2014] Mapsplice finished running (time used: 0:04:36.581115) [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. mapsplice.sh). For example:
#!/bin/bash cd /data/$USER/mydir module load mapsplice python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 Sp_ds.10k.left.fq \ -2 Sp_ds.10k.right.fq -p $SLURM_CPUS_PER_TASK
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=8 --mem=10g mapsplice.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.
Create a swarmfile (e.g. mapsplice.swarm). For example:
cd /data/user/mydir1; python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 seq1.fq -2 seq2.fq -p $SLURM_CPUS_PER_TASK cd /data/user/mydir2; python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 seq1.fq -2 seq2.fq -p $SLURM_CPUS_PER_TASK cd /data/user/mydir2; python $MSHOME/mapsplice.py -c . -x /fdb/bowtie/indexes -1 seq1.fq -2 seq2.fq -p $SLURM_CPUS_PER_TASK [...]
Submit this job using the swarm command.
swarm -f mapsplice.swarm -g 10 -t 8 --module mapsplicewhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module mapsplice | Loads the mapsplice module for each subjob in the swarm |