m-tools on Biowulf

A selection of software developed at the Australian Centre for Ecogenomics to aid in the analysis of metagenomic datasets:

These applications were bundled into a single singularity container in February, 2021. Many of the tools are outdated and, as such, the container will no longer be updated. For tools that have new updates, we recommend installing these in your user data directory. See our DIY Installation docs or contact staff@hpc.nih.gov

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load m-tools

[user@cn3144 ~]$ singlem -h
...::: SingleM v0.13.2 :::...

General usage:
  pipe         -> Generate an OTU table from raw sequences.
  summarise    -> Summarise and transform OTU tables.
  renew        -> Reannotate an OTU table with an updated taxonomy

Databases (of OTU sequences):
  makedb       -> Create a searchable database from an OTU table
  query        -> Find closely related sequences in a database.

Assembly and binning:
  appraise     -> How much of the metagenome do the genomes or assembly represent?

Packages (to search with):
  seqs         -> Find the best window for a SingleM package.
  create       -> Create a SingleM package.
  get_tree     -> Extract path to Newick tree file in a SingleM package.
  regenerate   -> Update a SingleM package with a new GraftM package (expert mode).

Use singlem  -h for command-specific help.
Some commands also have an extended --full_help flag.


[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. m-tools.sh). For example:

#!/bin/bash
set -e
module load m-tools
coverm genome --coupled read1.fastq.gz read2.fastq.gz \
    --genome-fasta-files genome1.fna genome2.fna \
    -o output.tsv

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] m-tools.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. m-tools.swarm). For example:

graftM graft --threads $SLURM_CPUS_PER_TASK \
             --forward input.R1.fa --reverse input.R2.fa \
             --graftm_package graftm_package1.gpkg \
             --output_directory graftm_out1
graftM graft --threads $SLURM_CPUS_PER_TASK \
             --forward input.R1.fa --reverse input.R2.fa \
             --graftm_package graftm_package2.gpkg \
             --output_directory graftm_out2
graftM graft --threads $SLURM_CPUS_PER_TASK \
             --forward input.R1.fa --reverse input.R2.fa \
             --graftm_package graftm_package3.gpkg \
             --output_directory graftm_out3
graftM graft --threads $SLURM_CPUS_PER_TASK \
             --forward input.R1.fa --reverse input.R2.fa \
             --graftm_package graftm_package4.gpkg \
             --output_directory graftm_out4

Submit this job using the swarm command.

swarm -f m-tools.swarm [-g #] [-t #] --module m-tools
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module m-tools Loads the m-tools module for each subjob in the swarm