Biowulf High Performance Computing at the NIH
svanna on Biowulf

The svanna is an efficient and accurate pathogenicity prediction for coding and regulatory structural variants in long-read genome sequencing.

References:

Documentation
Important Notes
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]8$ sinteractive --mem=8g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load svanna
[user@cn3144 ~]$ mkdir /data/$USER/svanna_test/
[user@cn3144 ~]$ cd /data/$USER/svanna_test/
[user@cn3144 ~]$ java -jar $SVANNA_PATH/svanna-cli-1.0.3.jar prioritize -d $SVANNA_DB/2304 --help
Prioritize the variants.
Usage: svanna-cli.jar prioritize [-hVv] [--no-breakends] [--uncompressed-output] -d=path/to/datadir
                                 [--frequency-threshold=] [--ic-mica-mode={DATABASE,IN_MEMORY}]
                                 [--min-read-support=] [--n-threads=2] [--out-dir=]
                                 [--output-format=html] [--overlap-threshold=] [-p=]
                                 [--prefix=] [--promoter-fitness-gain=]
                                 [--promoter-length=] [--report-top-variants=100]
                                 [--term-similarity-measure={RESNIK_SYMMETRIC, RESNIK_ASYMETRIC}] [--vcf=]
                                 [-t=]...
  -v                         Specify multiple -v options to increase verbosity.
                             For example, `-v -v -v` or `-vvv`
  -d, --data-directory=path/to/datadir
                             Path to SvAnna data directory.
  -h, --help                 Show this help message and exit.
  -V, --version              Print version information and exit.
SvAnna configuration:
      --term-similarity-measure={RESNIK_SYMMETRIC, RESNIK_ASYMETRIC}
                             Phenotype term similarity measure (default: RESNIK_SYMMETRIC).
      --ic-mica-mode={DATABASE,IN_MEMORY}
                             The mode for getting information content of the most informative common ancestors for
                               terms t1, and t2 (default: DATABASE).
      --promoter-length=
                             Number of bases prepended to a transcript and evaluated as a promoter region (default:
                               2000).
      --promoter-fitness-gain=
                             Set to 0. to score the promoter variants as strictly as coding variants, or to 1. to skip
                               them altogether (default: 0.6).
Analysis input:
  -p, --phenopacket=
                             Path to v1 or v2 phenopacket in JSON, YAML or Protobuf format.
  -t, --phenotype-term=
                             HPO term ID(s). Can be provided multiple times.
      --vcf=            Path to the input VCF file.
Run options:
      --frequency-threshold=
                             Frequency threshold as a percentage [0-100] (default: 1.0).
      --overlap-threshold=
                             Percentage threshold for determining variant's region is similar enough to database entry
                               (default: 80.0).
      --min-read-support=
                             Minimum number of ALT reads to prioritize (default: 3).
      --n-threads=2          Process variants using n threads (default: 2).
Output options:
      --no-breakends         Do not include breakend variants into HTML report (default: false).
      --output-format=html   Comma separated list of output formats to use for writing the results (default: html).
      --out-dir=     Path to folder where to write the output files (default: current working directory).
      --prefix=   Prefix for output files (default: based on the input VCF name).
      --report-top-variants=100
                             Report top n variants (default: 100).
      --uncompressed-output  Write tabular and VCF output formats with no compression (default: false).
See the full documentation at `https://svanna.readthedocs.io/en/master`

[user@cn3144 ]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. svanna.sh). For example:


#!/bin/bash
set -e
module load svanna
cp $SVANNA_TEST_DATA/*.vcf .
java -jar $SVANNA_PATH/svanna-cli-1.0.3.jar prioritize -d $SVANNA_DB/2304 --vcf example.vcf  --phenotype-term HP:0011890 --phenotype-term HP:0000978 --phenotype-term HP:0012147

Submit this job using the Slurm sbatch command.

sbatch --mem=8g svanna.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. svanna.swarm). For example:

java -jar $SVANNA_PATH/svanna-cli-1.0.3.jar prioritize -d $SVANNA_DB/2304 --vcf example1.vcf --out-dir results --prefix example1  --phenotype-term HP:0011890 --phenotype-term HP:0000978 --phenotype-term HP:0012147
java -jar $SVANNA_PATH/svanna-cli-1.0.3.jar prioritize -d $SVANNA_DB/2304 --vcf example2.vcf --out-dir results --prefix example2  --phenotype-term HP:0011890 --phenotype-term HP:0000978 --phenotype-term HP:0012147

Submit this job using the swarm command.

swarm -f svanna.swarm [-t #] [-g #] --module svanna
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module svanna Loads the svanna module for each subjob in the swarm