VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses

VirSorter2 is a DNA and RNA virus identification tool. It leverages genome-informed database advances across a collection of customized automatic classifiers to improve the accuracy and range of virus sequence detection.

Interactive job
Allocate an interactive session and run the program.
Sample session on a GPU node:

[user@biowulf ~]$ sinteractive --mem=100g --cpus-per-task=4 --gres=lscratch:50
[user@cn2379 ~]$ module load virsorter2
[+] Loading singularity  3.8.2  on cn0853
[+] Loading virsorter2  2.2.3
[user@cn2379 ~]$ mkdir -p /data/$USER/VirSorter2 && cd /data/$USER/VirSorter2
Configuring VirSorter2 (need to be done only once):
[user@cn2379 ~]$ wget -O db.tgz 
[user@cn2379 ~]$ gunzip -c db.tgz | tar -xvf -
[user@cn2379 ~]$ vs2 config --init-source --db-dir=$PWD/db
[2021-09-30 06:23 INFO] VirSorter 2.2.3
[2021-09-30 06:23 INFO] /VirSorter/bin/virsorter config --init-source --db-dir=/data/user/db
[2021-09-30 06:23 INFO] Attention: can not write template-config.yaml in source directory:
makeing a copy to user home direcotry:

[2021-09-30 06:23 INFO] Using {template} as config template
[2021-09-30 06:23 INFO] saving /data/user/VirSorter2/db as DBDIR to config file /home/user/.virsorter/template-config.yaml
Running VirSorter2 on sample data:
[user@cn2379 ~]$ cp $VS2_DATA/* .
[user@cn2379 ~]$ vs2 run -w test.out -i test-for-sop.fa --min-length 1500 -j 4 all 
[2021-09-30 17:03 INFO] VirSorter 2.1
[2021-09-30 17:03 INFO] /miniconda/bin/virsorter run -w test.out -i test-for-sop.fa --min-length 1500 -j 4 all
[2021-10-04 06:24 INFO] VirSorter 2.2.3
[2021-10-04 06:24 INFO] /opt/conda/envs/vs2/bin/virsorter config --init-source --db-dir=./db
[2021-10-04 06:24 INFO] Attention: can not write template-config.yaml in source directory:
makeing a copy to user home direcotry:

[2021-10-04 06:24 INFO] Using {template} as config template
[2021-10-04 06:24 INFO] saving /gs7/users/user/VirSorter2/db as DBDIR to config file /home/user/.virsorter/template-config.yaml
[user@cn0864 VirSorter2]$ ls db/conda_envs
[userga@cn0864 VirSorter2]$  vs2 run -w test.out -i test-for-sop.fa --min-length 1500 -j 4 all
[2021-10-04 06:24 INFO] VirSorter 2.2.3
[2021-10-04 06:24 INFO] /opt/conda/envs/vs2/bin/virsorter run -w test.out -i test-for-sop.fa --min-length 1500 -j 4 all
[2021-10-04 06:24 INFO] Using /home/user/.virsorter/template-config.yaml as config template
[2021-10-04 06:24 INFO] conig file written to /gs7/users/user/VirSorter2/test.out/config.yaml

[2021-10-04 06:24 INFO] Executing: snakemake --snakefile /opt/conda/envs/vs2/lib/python3.8/site-packages/virsorter/Snakefile --directory /gs7/users/user/VirSorter2/test.out --jobs 4 --configfile /gs7/users/user/VirSorter2/test.out/config.yaml --latency-wait 600 --rerun-incomplete --nolock  --conda-frontend mamba --conda-prefix /gs7/users/user/VirSorter2/db/conda_envs --use-conda    --quiet  all
[2021-10-04 06:26 INFO] # of seqs < 1500 bp and removed: 0
[2021-10-04 06:26 INFO] # of circular seqs: 0
[2021-10-04 06:26 INFO] # of linear seqs  : 7
[2021-10-04 06:26 INFO] No circular seqs found in contig file
[2021-10-04 06:26 INFO] Finish spliting linear contig file with common rbs
[2021-10-04 06:26 INFO] Step 1 - preprocess finished.
[2021-10-04 06:32 INFO] Step 2 - extract-feature finished.
[2021-10-04 06:33 INFO]
            ====> VirSorter run (provirus mode) finished.
            # of full    seqs (>=2 genes) as viral:     6
            # of partial seqs (>=2 genes) as viral:     1
            # of short   seqs (< 2 genes) as viral:     0

            Useful output files:
                final-viral-score.tsv       ==> score table
                final-viral-combined.fa     ==> all viral seqs
                final-viral-boundary.tsv    ==> table with boundary info

                Suffix is added to seq names in final-viral-combined.fa:
                full    seqs (>=2 genes) as viral:      ||full
                partial seqs (>=2 genes) as viral:      ||partial
                short   seqs (< 2 genes) as viral:      ||lt2gene

            Users can further screen the results based on the following
                columns in final-viral-score.tsv:
                - contig length (length)
                - hallmark gene count (hallmark)
                - viral gene % (viral)
                - cellular gene % (cellular)
            The group field in final-viral-score.tsv should NOT be used
                as reliable taxonomy info


[2021-10-04 06:33 INFO] Step 3 - classify finished.

Batch job
Create a batch input file (e.g. For example:

set -e
module load virsorter2
cp $VS2_DATA/* .  
vs2 run -w test.out -i test-for-sop.fa --min-length 1500 -j 4 all

