High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
FusionMap on Biowulf & Helix

FusionMap is an efficient fusion aligner which aligns reads spanning fusion junctions directly to the genome without prior knowledge of potential fusion regions. It detects and characterizes fusion junctions at base-pair resolution. FusionMap can be applied to detect fusion junctions in both single- and paired-end dataset from either gDNA-Seq or RNA-Seq studies.

How To Use

First set your environment for using FusionMap:

module load fusionmap

Due to licensing restrictions, only the most current version of FusionMap can be made available.

FusionMap is run with an input configuration file. Examples are found in $FMBIN/../TestDataset. Text that follows the '//' are considered comments and are ignored by FusionMap. Here is an example:

Please note that some filepaths must obviously be changed...

Define a directory as the Base_Directory. The Base_Directory will hold downloaded or built reference and index files needed to run FusionMap. If FusionMap can't find the required files, it will attempt to download them from here. NOTE: Downloading will fail if attempted from a cluster node, as the Biowulf cluster is NOT connected to the internet. This step must be run on Helix.

The Helix Systems staff maintains a small number of commonly used reference libraries available in /fdb/fusionmap:

Reference Build Gene Model
Human.B37RefGene
Human.B37Ensembl.R70
Human.B37UcscGene20130723
Human.B37.3RefGene
Human.B37.3Ensembl.R73
Human.B37.3UcscGene20130723
Human.hg19RefGene
Human.hg19Ensembl.R73
Human.hg19UcscGene20130723
Mouse.B38RefGene
Mouse.B38Ensembl.R73
Mouse.B38UcscGene20130723
Mouse.mm10RefGene
Mouse.mm10Ensembl.R73
Mouse.mm10UcscGene20130723

Please contact staff@hpc.nih.gov if you would like an additional reference library installed.

NOTE on gene filters:If you build your own reference library, you will need copy a set of files into the Base_Directory for filtering the genes:

cp -R /fdb/fusionmap/Fusion /path/to/[user-defined Base_Directory]

where [user-defined Base_Directory] is your created reference library Base_Directory.

Once these things have been defined/created, run this command:

mono $FMBIN/FusionMap.exe --semap /path/to/Base_Directory [RefLib Name] [GeneModel Name] /path/to/input/configuration/file > run.log

where [RefLib Name] and [GeneModel Name] are substituted for the desired pair (see above).

Run Fusionmap on Helix

This script automates the basic use of FusionMap, from building reference indices to aligning FASTQ files to the references. It uses an Ensembl build and annotation, but this can be modified.

 

Submitting a single batch job

Create an input configuration file, along the lines of the above examples.

Next, create a qsub script file. The file will contain the lines similar to the lines below. Modify the filepaths and reference file labels where appropriate before running.

#!/bin/bash
echo "Running on $SLURM_CPUS_PER_TASK cores"
env
module load fusionmap
cd /data/$USER/somewhereWithInputConfigFile
mono $FMBIN/FusionMap.exe --semap /path/to/Base_Directory Human.B37.3 RefGene /path/to/input/configuration/file > run.log

Submit to the batch system with:

$ sbatch jobscript

For more memory requirement, use --mem=Mg. For example, the following allocate 10gb:

$ sbatch --mem=10g ojbscript
Submitting a swarm of jobs on biowulf

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

cd /data/user/run1/; mono $FMBIN/FusionMap.exe .... [options]
cd /data/user/run2/; mono $FMBIN/FusionMap.exe .... [options]
...
cd /data/user/run10/; mono $FMBIN/FusionMap.exe .... [options]

Submit the job on biowulf:

 $ swarm -f cmdfile --module fusionmap

-f: the swarm command file name
--module: the module(s) required for the commands.

If each line of the commands need to use 10gb of memory use '-g 10' flag:

 $ swarm -g 10 -f cmdfile --module fusionmap

For more information regarding running swarm, see swarm.html

 

Running an interactive job

The user may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf] $ sinteractive
salloc.exe: Granted job allocation 1528 
      
[user@p4]$ cd /data/user/myruns
[user@pxxx]$ module load fusionmap
[user@pxxx]$ cd /data/userID/fusionmap/run1
[user@pxxx]$ mono $FMBiN/FusionMap.exe .... [options]
[user@pxxx]$ ...
[user@pxxx]$ exit

Documentation

http://www.omicsoft.com/fusionmap/