Parsnp is a command-line-tool for efficient microbial core genome alignment and SNP detection. Parsnp was designed to work in tandem with Gingr, a flexible platform for visualizing genome alignments and phylogenetic trees
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=16g --cpus-per-task=16 [user@cn3335 ~]$module load parsnp [+] Loading singularity 3.10.3 on cn3335 [+] Loading parsnp 1.7.4 ... user@cn3335 ~]$ parsnp -h |--Parsnp v1.2--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest usage: parsnp [options] [-g|-r|-q](see below) -d <genome_dir> -p <threads> Parsnp quick start for three example scenarios: 1) With reference & genbank file: >parsnp -g <reference_genbank_file1,reference_genbank_file2,..> -d <genome_dir> -p <threads> 2) With reference but without genbank file: >parsnp -r <reference_genome> -d <genome_dir> -p <threads> 3) Autorecruit reference to a draft assembly: >parsnp -q <draft_assembly> -d <genome_db> -p <threads> [Input parameters] <<input/output>> -c = <flag>: (c)urated genome directory, use all genomes in dir and ignore MUMi? (default = NO) -d = <path>: (d)irectory containing genomes/contigs/scaffolds -r = <path>: (r)eference genome (set to ! to pick random one from genome dir) -g = <string>: Gen(b)ank file(s) (gbk), comma separated list (default = None) -o = <string>: output directory? default [./P_CURRDATE_CURRTIME] -q = <path>: (optional) specify (assembled) query genome to use, in addition to genomes found in genome dir (default = NONE) <<MUMi>> -U = <float>: max MUMi distance value for MUMi distribution -M = <flag>: calculate MUMi and exit? overrides all other choices! (default: NO) -i = <float>: max MUM(i) distance (default: autocutoff based on distribution of MUMi values) <<MUM search>> -a = <int>: min (a)NCHOR length (default = 1.1*Log(S)) -C = <int>: maximal cluster D value? (default=100) -z = <path>: min LCB si(z)e? (default = 25) <<LCB alignment>> -D = <float>: maximal diagonal difference? Either percentage (e.g. 0.2) or bp (e.g. 100bp) (default = 0.12) -e = <flag> greedily extend LCBs? experimental! (default = NO) -n = <string>: alignment program (default: libMUSCLE) -u = <flag>: output unaligned regions? .unaligned (default: NO) <<Recombination filtration>> -x = <flag>: enable filtering of SNPs located in PhiPack identified regions of recombination? (default: NO) <<Misc>> -h = <flag>: (h)elp: print this message and exit -p = <int>: number of threads to use? (default= 1) -P = <int>: max partition size? limits memory usage (default= 15000000) -v = <flag>: (v)erbose output? (default = NO) -V = <flag>: output (V)ersion and exitDownload sample data to the current folder:
[user@cn3335 ~]$cp -r $PARSNP_DATA/* .Run parsnp on the sample data:
[user@cn3335 ~]$ parsnp -g ref/England1.gbk -d genomes -c |--Parsnp v1.2--| For detailed documentation please see --> http://harvest.readthedocs.org/en/latest ***************************************************************************** SETTINGS: |-refgenome: ref/England1.gbk.fna |-aligner: libMUSCLE |-seqdir: genomes |-outdir: /data/user/parsnp/P_2022_10_20_091716517037 |-OS: Linux |-threads: 32 ***************************************************************************** <<Parsnp started>> -->Reading Genome (asm, fasta) files from genomes.. |->[OK] -->Reading Genbank file(s) for reference (.gbk) ref/England1.gbk.. |->[OK] -->Running Parsnp multi-MUM search and libMUSCLE aligner.. |->[OK] -->Running PhiPack on LCBs to detect recombination.. |->[SKIP] -->Reconstructing core genome phylogeny.. |->[OK] -->Creating Gingr input file.. |->[OK] -->Calculating wall clock time.. |->Aligned 47 genomes in 0.65 seconds <<Parsnp finished! All output available in /data/user/parsnp/P_2022_10_20_091716517037>> Validating output directory contents... 1)parsnp.tree: newick format tree [OK] 2)parsnp.ggr: harvest input file for gingr (GUI) [OK] 3)parsnp.xmfa: XMFA formatted multi-alignment [OK] [user@cn3335 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$