Biowulf High Performance Computing at the NIH
SynthDNM: a random-forest based classifier for robust de novo prediction of SNPs and indels.

SynthDNM is a random-forest based classifier that can be readily adapted to new sequencing or variant-calling pipelines by applying a flexible approach to constructing simulated training examples from real data. The optimized SynthDNM classifiers predict de novo SNPs and indels with robust accuracy across multiple methods of variant calling.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session on a GPU node:

[user@biowulf ~]$ sinteractive --mem=8g -c4 --gres=lscratch:20
[user@cn2379 ~]$ module load synthdnm
[+] Loading singularity  3.8.4  on cn2379
[+] Loading synthdnm  1.1.50
Basic usage:
[user@cn2379 ~]$ synthdnm -h
usage:
 _______  __   __  __    _  _______  __   __  ______   __    _  __   __
|       ||  | |  ||  |  | ||       ||  | |  ||      | |  |  | ||  |_|  |
|  _____||  |_|  ||   |_| ||_     _||  |_|  ||  _    ||   |_| ||       |
| |_____ |       ||       |  |   |  |       || | |   ||       ||       |
|_____  ||_     _||  _    |  |   |  |       || |_|   ||  _    ||       |
 _____| |  |   |  | | |   |  |   |  |   _   ||       || | |   || ||_|| |
|_______|  |___|  |_|  |__|  |___|  |__| |__||______| |_|  |__||_|   |_|

Version 0.1.0.1    Authors: Danny Antaki, Aojie Lian, James Guevara
                   Contact: j3guevar@ucsd.health.edu
---------------------------------------------------------------------------------
    synthdnm  -f <in.fam>  -v  <in.vcf.gz>  [-oLgkVh]

necessary arguments:

  -v, --vcf    PATH    VCF file
  -f, --fam    PATH    PLINK pedigree (.fam/.ped) file

optional arguments:

  -e, --extract_features                             flag that disable classification (if you only want to extract features)
  -s, --snp_classifier                       PATH    path to snp classifier joblib file [default is pretrained classifier]
  -l, --indel_classifier                     PATH    path to indel classifier joblib file [default is pretrained classifier]
  -g  --gen                                  STR     human reference genome version [default: hg38]

  -h, --help           show this message and exit


optional arguments:
  -h, --help            show this help message and exit
  -v VCF, --vcf VCF
  -f FAM, --fam FAM
  -g {hg19,hg38}, --gen {hg19,hg38}
  -i INFO, --info INFO
  -e, --extract_features
  -s SNP_CLASSIFIER, --snp_classifier SNP_CLASSIFIER
  -l INDEL_CLASSIFIER, --indel_classifier INDEL_CLASSIFIER
Download sample data:
[user@cn2379 ~]$ cp $SDNM_DATA/* .
Run SynthDNM on the sample data:
[user@cn2379 ~]$  synthdnm -v tutorial.vcf -f tutorial.ped -e -s snp_200-auto-5.dv.joblib
args.snp_classifier= snp_200-auto-5.dv.joblib

End the interactive session:
[user@cn2379 ~]$ exit
Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. synthdnm.sh). For example:

#!/bin/bash
set -e
module load synthdnm
cp $SDNM_DATA/* .
synthdnm -v tutorial.vcf -f tutorial.ped

Submit this job using the Slurm sbatch command.

sbatch synthdnm.sh