High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
MToolbox on Biowulf and Helix
MToolbox Logo

MToolBox is a highly automated bioinformatics pipeline to reconstruct and analyze human mitochondrial DNA from high throughput sequencing data. It includes an updated computational strategy to assemble mitochondrial genomes from whole exome and/or genome sequencing and an improved fragment-classify tool for haplogroup assignment, functional and prioritization analysis of mitochondrial variants. It also provides pathogenicity scores, profiles of genome variability and disease-associations for mitochondrial variants and a Variant Call Format file featuring allele-specific heteroplasmy.

Citation

If you use MToolBox, please cite:

Calabrese C, Simone D, Diroma MA, Santorsola M, Gutt C, Gasparre G, Picardi E, Pesole G, Attimonelli M. MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing. Bioinformatics. 2014 Jul 14. pii: btu483. Epub ahead of print] PubMed PMID: 25028726

On Helix

To use MToolbox on either system, you must load the module.

module load MToolbox
cd /MToolbox/input/dir
MToolbox.sh -i <input file> 

You must specify the "-i" option along with only one of the supported formats (sam, bam, fastq or fasta) in a single run. If you want to run MToolbox in the same folder, you should delete all files produced in the previous execution. Keep in mind, you must run MToolbox from within the folder with the desired input files.

For MToolbox help, at the prompt, type

MToolbox.sh -h 
Running a Single Batch Job on Biowulf

Create a batch input file, run_MToolbox:

#/bin/bash
# ----- this file is run_MToolbox -----

module load MToolbox
cd MToolbox/input/files/dir
MToolBox.sh -i fastq -I -M -r RCRS -m "-g /usr/local/apps/gmap-gsnap/2015-12-31/bin/gsnap -D /usr/local/apps/MToolbox/0.3/gmapdb/ -t8" -a "-s /usr/local/apps/samtools/samtools-0.1.20/samtools -r /usr/local/apps/MToolbox/0.3/genome/" -c "-m /usr/local/apps/muscle/3.8.31/bin/muscle"

The job can be submitted with

sbatch run_Mtoolbox

This command will submit the job to 2 cores and 4GB of memory. If you need more memory than the default 4 GB, use

sbatch --mem=#g run_Mtoolbox
Running a swarm of MToolbox jobs on Biowulf

The swarm program is designed to submit a group of commands to the Biowulf cluster. Each command is represented by a single line in the swarm command file that you create, and runs as a separate batch job. See the swarm page for more information.

Create a swarm command file, MToolbox_swarm. Example:

cd /MToolbox/input/dir1; MToolbox.sh -i <input_format> -r <reference_sequence> -m "<mapExome_options>" -a "<assembleMTgenome_options>" -c "<mt-classifier_options>"
cd /MToolbox/input/dir2; Mtoolbox.sh -i <input_format> -r <reference_sequence> -m "<mapExome_options>" -a "<assembleMTgenome_options>" -c "<mt-classifier_options>"
cd /MToolbox/input/dir3; MToolbox.sh -i <input_format> -r <reference_sequence> -m "<mapExome_options>" -a "<assembleMTgenome_options>" -c "<mt-classifier_options>"
cd /MToolbox/input/dir4; MToolbox.sh -i <input_format> -r <reference_sequence> -m "<mapExome_options>" -a "<assembleMTgenome_options>" -c "<mt-classifier_options>"

Submit this to the batch system with the command:

swarm -f MToolbox_swarm --module MToolbox

If each MToolbox job requires more than the default 4 GB of memory, use

swarm -g # -f MToolbox_swarm --module MToolbox

For information on how to monitor your job(s),see Monitoring Jobs.

Running MToolbox interactively

If you want to run your job interactively, you can allocate a node for interactive use. Once the node is allocated, you can type commands directly on the command-line. Example:

[user@biowulf ~]$ sinteractive
salloc.exe: Pending job allocation 15323416salloc.exe: job 15323416 queued and waiting for resourcessalloc.exe: job 15323416 has been allocated resourcessalloc.exe: Granted job allocation 15323416salloc.exe: Waiting for resource configurationsalloc.exe: Nodes cn1640 are ready for job
[user@cn1640 ~]$ cd /MToolbox/input/dir
[user@cn1640 dir]$ MToolbox.sh -i <input_format> -r <reference_sequence> -m "<mapExome_options>" -a "<assembleMTgenome_options>" -c "<mt-classifier_options>"

If you need more memory than the default 4 GB, use sinteractive --mem=#g

Documentation