Phylogenetic Assignment of Named Global Outbreak LINeages. pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf ~]$ sinteractive -c4 --mem=8g --gres=lscratch:10 salloc.exe: Pending job allocation 12440727 salloc.exe: job 12440727 queued and waiting for resources salloc.exe: job 12440727 has been allocated resources salloc.exe: Granted job allocation 12440727 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn0913 are ready for job srun: error: x11: no local DISPLAY defined, skipping error: unable to open file /tmp/slurm-spank-x11.12440727.0 slurmstepd: error: x11: unable to read DISPLAY value [user@cn0913 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn0913 12440727]$ module load pangolin [+] Loading pangolin 2.3.6 on cn0913 [+] Loading singularity 3.7.3 on cn0913 [user@cn0913 12440727]$ cp $PANGOLIN_TESTDATA/* . [user@cn0913 12440727]$ mkdir tempdir [user@cn0913 12440727]$ pangolin --tempdir=./tempdir cluster.fasta Found the snakefile The query file is:/lscratch/12440727/cluster.fasta EDB003 sequence too short EDB004 has an N content of 0.98 Looking in /opt/conda/envs/pangolin/lib/python3.7/site-packages/pangoLEARN/data for data files... Data files found Trained model: /opt/conda/envs/pangolin/lib/python3.7/site-packages/pangoLEARN/data/decisionTree_v1.joblib Header file: /opt/conda/envs/pangolin/lib/python3.7/site-packages/pangoLEARN/data/decisionTreeHeaders_v1.joblib Lineages csv: /opt/conda/envs/pangolin/lib/python3.7/site-packages/pangoLEARN/data/lineages.metadata.csv Job counts: count jobs 1 add_failed_seqs 1 align_to_reference 1 all 1 minimap2_check_distance 1 overwrite 1 pangolearn 1 parse_paf 1 type_variants_b117 1 type_variants_b12142 1 type_variants_b1351 1 type_variants_p1 1 type_variants_p2 1 type_variants_p3 13 Job counts: count jobs 1 parse_paf 1 [M::mm_idx_gen::0.003*1.49] collected minimizers [M::mm_idx_gen::0.004*1.33] sorted minimizers [M::main::0.004*1.33] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.004*1.30] mid_occ = 100 [M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.004*1.28] distinct minimizers: 2952 (100.00% are singletons); average occurrences: 1.000; average spacing: 10.130 warning: using --pad without --trim has no effect [M::worker_pipeline::0.095*1.00] mapped 8 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -a -x asm5 -t 1 /opt/conda/envs/pangolin/lib/python3.7/site-packages/pangolin-2.3.6-py3.7.egg/pangolin/data/reference.fasta /lscratch/12440727/./tempdir/tmpat6kcm3p/mappable.fasta [M::main] Real time: 0.096 sec; CPU: 0.096 sec; Peak RSS: 0.003 GB loading model 04/09/2021, 13:23:32 processing block of 8 sequences 04/09/2021, 13:23:33 complete 04/09/2021, 13:23:34 Job counts: count jobs 1 add_failed_seqs 1 Job counts: count jobs 1 overwrite 1 Output file written to: /lscratch/12440727/lineage_report.csv [user@cn0913 12440727]$ cat lineage_report.csv taxon,lineage,probability,pangoLEARN_version,status,note EDB001,B.1.1.26,1.0,2021-03-29,passed_qc, EDB002,B.1.1.26,1.0,2021-03-29,passed_qc, EDB005,B,1.0,2021-03-29,passed_qc, EDB006,B.1.1.26,1.0,2021-03-29,passed_qc, EDB007,B.1.1.26,1.0,2021-03-29,passed_qc, EDB008,B.1.1.26,1.0,2021-03-29,passed_qc, EDB009,B.1.1.26,1.0,2021-03-29,passed_qc, EDB010,B,1.0,2021-03-29,passed_qc, EDB003,None,0,2021-03-29,fail,seq_len:2997 EDB004,None,0,2021-03-29,fail,N_content:0.98 [user@cn0913 12440727]$ exit exit salloc.exe: Relinquishing job allocation 12440727 [user@biowulf ~]$
Create a batch input file (e.g. pangolin.sh). For example:
#!/bin/bash set -e module load pangolin pagolin --tempdir=/lscratch/$SLURM_JOB_ID my.fasta
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] [--gres=lscratch:#] pangolin.sh
Create a swarmfile (e.g. pangolin.swarm). For example:
pagolin --tempdir=/lscratch/$SLURM_JOB_ID A.fasta pagolin --tempdir=/lscratch/$SLURM_JOB_ID B.fasta pagolin --tempdir=/lscratch/$SLURM_JOB_ID C.fasta pagolin --tempdir=/lscratch/$SLURM_JOB_ID D.fasta
Submit this job using the swarm command.
swarm -f pangolin.swarm [-g #] [-t #] [--gres=lscratch:#] --module pangolinwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module pangolin | Loads the pangolin module for each subjob in the swarm |