StrainGE: Strain-level Genome Exploration

StrainGE is a set of tools to analyse the within-species strain diversity in bacterial populations. It consists of two main components: 1) StrainGST: Strain Genome Search tool, a tool to find close reference genomes for strains present in a sample and 2) StrainGR: Strain Genome Recovery, a tool to perform strain-aware variant calling at low coverages.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
[user@cn3144 ~]$ module load strange
[+] Loading singularity  4.0.1   on cn3144
[+] Loading strainge  1.3.9
[user@cn3144 ~]$ strainge 
2024-05-29 11:11:45,581 - WARNING:root:DEPRECATION WARNING - the `strainge` CLI program is deprecated, please use `straingst` or `straingr` instead.
usage: strainge [-h] [--version] [-v] {kmerize,kmersim,cluster,createdb,search,call,view,compare,tree,stats,plot} ...

================================
StrainGE: Strain Genome Explorer
================================
A set of tools for strain-level analysis in mixed metagenomic samples
---------------------------------------------------------------------

Version: 1.3.9

DEPRECATED: please use `straingst` or `straingr` instead.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v, --verbose         Increase verbosity level, number of levels: 0, 1, 2

Subcommands:
  {kmerize,kmersim,cluster,createdb,search,call,view,compare,tree,stats,plot}
    kmerize             K-merize a given reference sequence or a sample read dataset.
    kmersim             Compare k-mer sets with each other. Both all-vs-all and one-vs-all is supported.
    cluster             Group k-mer sets that are very similar to each other together.
    createdb            Create pan-genome database in HDF5 format from a list of k-merized strains.
    search              StrainGST: strain genome search tool. Identify close reference genomes to strains present in a
                        sample.
    call                StrainGR: strain-aware variant caller for metagenomic samples
    view                View call statistics stored in a HDF5 file and output results to different file formats
    compare             Compare strains and variant calls in two different samples. Reads of both samples must be aligned to
                        the same reference.
    tree                Build an approximate phylogenetic tree based on a given distance matrix, using neighbour joining.
    stats               Obtain statistics about a given k-mer set.
    plot                Generate plots for a given k-mer set.
[user@cn3144 ~]$ straingr 
usage: straingr [-h] [--version] [-v] {prepare-ref,call,view,compare,dist,tree} ...

================================
StrainGE: Strain Genome Explorer
================================
A set of tools for strain-level analysis in mixed metagenomic samples
---------------------------------------------------------------------

Version: 1.3.9

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v, --verbose         Increase verbosity level, number of levels: 0, 1, 2

Subcommands:
  {prepare-ref,call,view,compare,dist,tree}
    prepare-ref         Prepare a concatenated reference for StrainGR variant calling.
    call                StrainGR: strain-aware variant caller for metagenomic samples
    view                View call statistics stored in a HDF5 file and output results to different file formats
    compare             Compare strains and variant calls in two different samples. Reads of both samples must be aligned to
                        the same reference.
    dist                For all strains across multiple samples close to the same reference genome, calculate the pairwise
                        genetic distance and output it in matrix form.
    tree                Build an approximate phylogenetic tree based on a given distance matrix, using neighbour joining.
[user@cn3144 ~]$ straingst
usage: straingst [-h] [--version] [-v] {kmerize,kmersim,kmermerge,cluster,createdb,stats,plot,run} ...

================================
StrainGE: Strain Genome Explorer
================================
A set of tools for strain-level analysis in mixed metagenomic samples
---------------------------------------------------------------------

Version: 1.3.9

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v, --verbose         Increase verbosity level, number of levels: 0, 1, 2

Subcommands:
  {kmerize,kmersim,kmermerge,cluster,createdb,stats,plot,run}
    kmerize             K-merize a given reference sequence or a sample read dataset.
    kmersim             Compare k-mer sets with each other. Both all-vs-all and one-vs-all is supported.
    kmermerge           Merge k-mer set files.
    cluster             Group k-mer sets that are very similar to each other together.
    createdb            Create pan-genome database in HDF5 format from a list of k-merized strains.
    stats               Obtain statistics about a given k-mer set.
    plot                Generate plots for a given k-mer set.
    run                 StrainGST: strain genome search tool. Identify close reference genomes to strains present in a
                        sample.

End the interactive session:
[user@cn3111 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$