High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
MOSAIK
logo

Description

MOSAIK is a reference-guided assembler comprising of four main modular programs:

MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT.

The MOSAIK suite was written by Michael Strömberg of the Marth lab at Boston College.

How To Use

There are multiple versions of Mosaik available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail mosaik

To select a module, type

module load mosaik/[ver]

where [ver] is the version of choice. This will set your $PATH variable, as well as two other variables:

Interactive use

NOTE: if mosaik is run on Helix, change $MOSAIK_TMP to /scratch:

export MOSAIK_TMP=/scratch

There is a demo directory that comes with MOSAIK.

cp -Rp $MOSAIK_HOME/demo .
cd demo
bash Build.sh
bash Align.sh
module load bamtools
bash RetrainMQ.sh

As a Slurm batch job

Create a batch script:

#----- This file is Mosaik.sh -----#
#!/bin/bash
#SBATCH --job-name Mosaik
#SBATCH --error Mosaik.err
#SBATCH --output Mosaik.out
cd $PBS_O_WORKDIR

# Set the environment using mosaik module
module load mosaik

# Build the Mosaik .dat file for reads
MosaikBuild -fr myreads.fasta -fq myreads.fasta.qual -out myreads.dat

# Build the Mosaik .dat file for the reference chromosome
MosaikBuild -fr myreference.fasta -oa myreference.dat

# Align the reads to the reference chromosome using 8 processors
MosaikAligner -in myreads.dat -out myreads_aligned.dat -ia myreference.dat -hs 15 -mm 4 -m all -mhp 100 -act 20 -j myjumpdb -p 8

The -p option sets the number of CPUs to use during execution. This batch script uses -p 8, which requires the batch script to allocate 8 cpus per task. Mosaik also typically requires at least 24GB of memory. Please use jobhist to verify this after your runs.

[biowulf]$ --cpus-per-task=8 --mem=24g Mosaik.sh

Documentation