SynthDNM is a random-forest based classifier that can be readily adapted to new sequencing or variant-calling pipelines by applying a flexible approach to constructing simulated training examples from real data. The optimized SynthDNM classifiers predict de novo SNPs and indels with robust accuracy across multiple methods of variant calling.
Allocate an interactive session and run the program.
Sample session on a GPU node:
[user@biowulf ~]$ sinteractive --mem=8g -c4 --gres=lscratch:20 [user@cn2379 ~]$ module load synthdnm [+] Loading singularity 3.8.4 on cn2379 [+] Loading synthdnm 1.1.50Basic usage:
[user@cn2379 ~]$ run_synthdnm.py -h usage: run_synthdnm.py [-h] [--vcf_file VCF_FILE] --ped_file PED_FILE [--region REGION] [--features_file FEATURES_FILE] [--output_folder OUTPUT_FOLDER] [--training_set_tsv TRAINING_SET_TSV] {classify,make_training_set,train,grid_search} ... SynthDNM: a de novo mutation classifier and training paradigm positional arguments: {classify,make_training_set,train,grid_search} Available sub-commands classify Classify DNMs using pre-trained classifiers. make_training_set Make training set. train Train classifiers grid_search Randomized grid search across hyperparameters. optional arguments: -h, --help show this help message and exit --vcf_file VCF_FILE VCF file input --ped_file PED_FILE Pedigree file (.fam/.ped/.psam) input --region REGION Interval ('{}' or '{}:{}-{}' in format of chr or chr:start-end) on which to run training or classification --features_file FEATURES_FILE Features file input --output_folder OUTPUT_FOLDER Output folder for output files (if not used, then output folder is set to 'synthdnm_output') --training_set_tsv TRAINING_SET_TSV Training set file (created using make_training_set mode) [user@cn2379 ~]$ run_synthdnm.py classify -h usage: run_synthdnm.py classify [-h] --clf_folder CLF_FOLDER [-feature_extraction_only] optional arguments: -h, --help show this help message and exit --clf_folder CLF_FOLDER Folder that contains the classifiers, which must be in .pkl format (if not specified, will look for them in the default data folder) -feature_extraction_only Only output the features file (without classifyingEnd the interactive session:
[user@cn2379 ~]$ exit
Create a batch input file (e.g. synthdnm.sh). For example:
#!/bin/bash set -e module load synthdnm cp $SDNM_DATA/* . synthdnm -v tutorial.vcf -f tutorial.ped
Submit this job using the Slurm sbatch command.
sbatch synthdnm.sh