A selection of software developed at the Australian Centre for Ecogenomics to aid in the analysis of metagenomic datasets:
These applications were bundled into a single singularity container in February, 2021. Many of the tools are outdated and, as such, the container will no longer be updated. For tools that have new updates, we recommend installing these in your user data directory. See our DIY Installation docs or contact staff@hpc.nih.gov
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load m-tools [user@cn3144 ~]$ singlem -h ...::: SingleM v0.13.2 :::... General usage: pipe -> Generate an OTU table from raw sequences. summarise -> Summarise and transform OTU tables. renew -> Reannotate an OTU table with an updated taxonomy Databases (of OTU sequences): makedb -> Create a searchable database from an OTU table query -> Find closely related sequences in a database. Assembly and binning: appraise -> How much of the metagenome do the genomes or assembly represent? Packages (to search with): seqs -> Find the best window for a SingleM package. create -> Create a SingleM package. get_tree -> Extract path to Newick tree file in a SingleM package. regenerate -> Update a SingleM package with a new GraftM package (expert mode). Use singlem-h for command-specific help. Some commands also have an extended --full_help flag. [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. m-tools.sh). For example:
#!/bin/bash set -e module load m-tools coverm genome --coupled read1.fastq.gz read2.fastq.gz \ --genome-fasta-files genome1.fna genome2.fna \ -o output.tsv
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] m-tools.sh
Create a swarmfile (e.g. m-tools.swarm). For example:
graftM graft --threads $SLURM_CPUS_PER_TASK \ --forward input.R1.fa --reverse input.R2.fa \ --graftm_package graftm_package1.gpkg \ --output_directory graftm_out1 graftM graft --threads $SLURM_CPUS_PER_TASK \ --forward input.R1.fa --reverse input.R2.fa \ --graftm_package graftm_package2.gpkg \ --output_directory graftm_out2 graftM graft --threads $SLURM_CPUS_PER_TASK \ --forward input.R1.fa --reverse input.R2.fa \ --graftm_package graftm_package3.gpkg \ --output_directory graftm_out3 graftM graft --threads $SLURM_CPUS_PER_TASK \ --forward input.R1.fa --reverse input.R2.fa \ --graftm_package graftm_package4.gpkg \ --output_directory graftm_out4
Submit this job using the swarm command.
swarm -f m-tools.swarm [-g #] [-t #] --module m-toolswhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module m-tools | Loads the m-tools module for each subjob in the swarm |