Biowulf High Performance Computing at the NIH
RapMap: Rapid sensitive and accurate read mapping via quasi-mapping

RapMap is a tool for rapid sensitive and accurate read mapping via quasi-mapping. It is capable of mapping sequencing reads to a target transcriptome substantially faster than existing alignment tools.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive -c14 
[user@@cn3107 ~]$module load RapMap
[user@@cn3107 ~]$rapmap -h
RapMap Transcriptome Aligner v0.5.0
=====================================

There are currently 4 RapMap subcommands
    pseudoindex   --- builds a k-mer-based index
    pseudomap     --- map reads using a k-mer-based index
    quasiindex --- builds a suffix array-based (SA) index
    quasimap   --- map reads using the SA-based index

Run a corresponding command "rapmap  -h" for
more information on each of the possible RapMap
commands.
[user@@cn3107 ~]$rapmap pseudoindex -h
RapMap Indexer

USAGE: 

   rapmap  [-k ] -i  -t  [--]
           [--version] [-h]
Where: 
   -k ,  --klen 
     The length of k-mer to index

   -i ,  --index 
     (required)  The location where the index should be written

   -t ,  --transcripts 
     (required)  The transcript file to be indexed

   --,  --ignore_rest
     Ignores the rest of the labeled arguments following this flag.

   --version
     Displays version information and exits.

   -h,  --help
     Displays usage information and exits.
   RapMap Indexera
Create a RapMap pseudoindex from the sample transcript data:
[user@@cn3107 ~]$cp $RAPMAP_DATA/* .
[user@@cn3107 ~]$rapmap pseudoindex -k 25 -i pseudoindex_dir -t transcripts.fasta
RapMap Indexer

[Step 1 of 4] : counting k-mers

Elapsed time: 0.017061s
Clipped poly-A tails from 0 transcripts

[Step 2 of 4] : marking k-mers
marked kmers for 0 transcripts
Elapsed time: 0.0105816s

[Step 3 of 4] : building k-mers equivalence classes
done! There were 21 classes
Elapsed time: 2.53394s

[Step 4 of 4] : finalizing index
finalized kmers for 0 transcripts
Elapsed time: 0.0528086s
Writing the index to pseudoindex_dir/
transcriptIDs.size() = 28187
parsed 15 transcripts
Use the created pseudoindex to map read(s) to the transcript:
[user@@cn3107 ~]$rapmap pseudomap -i pseudoindex_dir -m 3 -t 14 -o pseudomap.txt -r reads_1.fastq 
RapMap Mapper
[2018-09-07 08:50:32.985] [stderrLog] [info] loading k-mer info list . . .
[2018-09-07 08:50:32.987] [stderrLog] [info] done

Elapsed time: 0.00210966s
[2018-09-07 08:50:32.987] [stderrLog] [info] loading k-mer => id hash . . . 
[2018-09-07 08:50:32.988] [stderrLog] [info] 	header format = gus/special
[2018-09-07 08:50:32.988] [stderrLog] [info] 	# distinct k-mers = 18925
[2018-09-07 08:50:32.988] [stderrLog] [info] 	hash key len = 50
[2018-09-07 08:50:32.988] [stderrLog] [info] 	counter len = 32
[2018-09-07 08:50:32.988] [stderrLog] [info] 	max reprobe offset = 126
[2018-09-07 08:50:32.988] [stderrLog] [info] 	size in bytes = 1073741824
[2018-09-07 08:50:33.561] [stderrLog] [info] done
Elapsed time: 0.574164s
[2018-09-07 08:50:33.562] [stderrLog] [info] loading eq classes . . . 
[2018-09-07 08:50:33.562] [stderrLog] [info] [21] classes
[2018-09-07 08:50:33.562] [stderrLog] [info] done
Elapsed time: 9.4765e-05s
[2018-09-07 08:50:33.562] [stderrLog] [info] loading eq class labels . . . 
[2018-09-07 08:50:33.562] [stderrLog] [info] [30] labels
[2018-09-07 08:50:33.562] [stderrLog] [info] done
Elapsed time: 3.0231e-05s
[2018-09-07 08:50:33.563] [stderrLog] [info] loading position list . . . 
[2018-09-07 08:50:33.563] [stderrLog] [info] [28187] total k-mer positions
[2018-09-07 08:50:33.563] [stderrLog] [info] done
Elapsed time: 6.9863e-05s
[2018-09-07 08:50:33.563] [stderrLog] [info] loading transcript names 
[2018-09-07 08:50:33.563] [stderrLog] [info] [15] transcripts in index 
[2018-09-07 08:50:33.563] [stderrLog] [info] done 
Elapsed time: 2.4205e-05s
[2018-09-07 08:50:33.563] [stderrLog] [info] loading transcript lengths
[2018-09-07 08:50:33.563] [stderrLog] [info] [15] transcripts in index
[2018-09-07 08:50:33.563] [stderrLog] [info] done 
Elapsed time: 1.9552e-05s
[2018-09-07 08:50:33.564] [stderrLog] [info] loading forward jumps
[2018-09-07 08:50:33.564] [stderrLog] [info] [18925] forward jumps
[2018-09-07 08:50:33.564] [stderrLog] [info] done 
Elapsed time: 5.0421e-05s
[2018-09-07 08:50:33.564] [stderrLog] [info] loading reverse jumps
[2018-09-07 08:50:33.564] [stderrLog] [info] [18925] reverse jumps
[2018-09-07 08:50:33.564] [stderrLog] [info] done 
Elapsed time: 4.554e-05s

[2018-09-07 08:50:33.565] [stderrLog] [info] mapping reads . . . 

[2018-09-07 08:50:33.586] [stderrLog] [info] Done mapping reads.
[2018-09-07 08:50:33.586] [stderrLog] [info] In total saw 10000 reads.
[2018-09-07 08:50:33.586] [stderrLog] [info] Final # hits per read = 1.4554
[2018-09-07 08:50:33.586] [stderrLog] [info] Discarded 0 reads because they had > 3 alignments
[2018-09-07 08:50:33.586] [stderrLog] [info] flushing output
Elapsed time: 0.0210106s

Likewise, produce a RapMap quasiindex and then use it to map read(s) to the transcript:
[user@@cn3107 ~]$rapmap quasiindex -k 25 -i  quasiindex_dir -t transcripts.fasta
RapMap Indexer

[Step 1 of 4] : counting k-mers
Elapsed time: 0.00309017s

Replaced 0 non-ATCG nucleotides
Clipped poly-A tails from 0 transcripts
Building rank-select dictionary and saving to disk done
Elapsed time: 3.8894e-05s
Writing sequence data to file . . . done
Elapsed time: 8.7386e-05s
[info] Building 32-bit suffix array (length of generalized text is 28577)
Building suffix array . . . success
saving to disk . . . done
Elapsed time: 0.000133423s
done
Elapsed time: 0.00654297s
processed 0 positions
khash had 18936 keys
saving hash to disk . . . done
Elapsed time: 0.00140053s
[denisovga@cn4468 sample_data]$  rapmap quasimap -i quasiindex_dir -m 3 -t 14 -o pseudomap.txt -r reads_2.fastq
[2018-09-07 08:52:19.758] [stderrLog] [info] 
command line options
====================
index: 
unmated read(s): reads_2.fastq
output: pseudomap.txt
num. threads: 14
max num. hits: 3
quasi-coverage: 0
no output: false
sensitive: false
strict check: true
fuzzy intersection: false
consistent hits: false
====================
[2018-09-07 08:52:19.760] [stderrLog] [info] Loading Suffix Array 
[2018-09-07 08:52:19.761] [stderrLog] [info] Loading Transcript Info 
[2018-09-07 08:52:19.761] [stderrLog] [info] Loading Rank-Select Bit Array
[2018-09-07 08:52:19.761] [stderrLog] [info] There were 15 set bits in the bit array
[2018-09-07 08:52:19.761] [stderrLog] [info] Computing transcript lengths
[2018-09-07 08:52:19.761] [stderrLog] [info] Waiting to finish loading hash
[2018-09-07 08:52:19.764] [stderrLog] [info] Done loading index

[2018-09-07 08:52:19.765] [stderrLog] [info] mapping reads . . . 

[2018-09-07 08:52:19.815] [stderrLog] [info] Done mapping reads.
[2018-09-07 08:52:19.815] [stderrLog] [info] In total saw 10000 reads.
[2018-09-07 08:52:19.815] [stderrLog] [info] Final # hits per read = 1.4542
[2018-09-07 08:52:19.815] [stderrLog] [info] flushing output queue.
Elapsed time: 0.0498976s
End the interactive session:
[user@cn4468 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. rapmap.sh). For example:

#!/bin/bash
#SBATCH --mem=4g
module load RapMap/0.5.0    
mkdir /scratch/$USER/RapMap  
cd /scratch/$USER/RapMap
cp $RAPMAP_DATA/* . 
rapmap pseudoindex -k 25 -i pseudoindex_dir -t transcripts.fasta
rapmap pseudomap -i pseudoindex_dir -m 3 -t 14 -o pseudomap.txt -r reads_1.fastq
rapmap quasiindex -k 25 -i  quasiindex_dir -t transcripts.fasta
rapmap quasimap -i quasiindex_dir -m 3 -t 14 -o pseudomap.txt -r reads_2.fastq

Submit this job using the Slurm sbatch command.

sbatch rapmap.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. rapmap.swarm). For example:

#!/bin/bash
mkdir -p /scratch/$USER/RapMap 
cd /scratch/$USER/RapMap
rapmap pseudoindex -k 25 -i pseudoindex_dir -t transcripts.fasta
rapmap pseudomap -i pseudoindex_dir -m 3 -t 14 -o pseudomap.txt -r reads_1.fastq
rapmap quasiindex -k 25 -i  quasiindex_dir -t transcripts.fasta
rapmap quasimap -i quasiindex_dir -m 3 -t 14 -o pseudomap.txt -r reads_2.fastq

Submit this job using the swarm command.

swarm -f rapmap.swarm --module RapMap/0.5.0 -g 4 --gres lscratch:10