VirSorter2 is a DNA and RNA virus identification tool. It leverages genome-informed database advances across a collection of customized automatic classifiers to improve the accuracy and range of virus sequence detection.
Allocate an interactive session and run the program.
Sample session on a GPU node:
[user@biowulf ~]$ sinteractive --mem=100g --cpus-per-task=4 --gres=lscratch:50 [user@cn2379 ~]$ module load virsorter2 [+] Loading singularity 3.8.2 on cn0853 [+] Loading virsorter2 2.2.3 [user@cn2379 ~]$ mkdir -p /data/$USER/VirSorter2 && cd /data/$USER/VirSorter2Configuring VirSorter2 (need to be done only once):
[user@cn2379 ~]$ wget https://osf.io/v46sc/download -O db.tgz [user@cn2379 ~]$ gunzip -c db.tgz | tar -xvf - [user@cn2379 ~]$ vs2 config --init-source --db-dir=$PWD/db [2021-09-30 06:23 INFO] VirSorter 2.2.3 [2021-09-30 06:23 INFO] /VirSorter/bin/virsorter config --init-source --db-dir=/data/user/db [2021-09-30 06:23 INFO] Attention: can not write template-config.yaml in source directory: /VirSorter/lib/python3.7/site-packages/virsorter makeing a copy to user home direcotry: /home/user/.virsorter/template-config.yaml [2021-09-30 06:23 INFO] Using {template} as config template [2021-09-30 06:23 INFO] saving /data/user/VirSorter2/db as DBDIR to config file /home/user/.virsorter/template-config.yamlRunning VirSorter2 on sample data:
[user@cn2379 ~]$ cp $VS2_DATA/* . [user@cn2379 ~]$ vs2 run -w test.out -i test-for-sop.fa --min-length 1500 -j 4 all [2021-09-30 17:03 INFO] VirSorter 2.1 [2021-09-30 17:03 INFO] /miniconda/bin/virsorter run -w test.out -i test-for-sop.fa --min-length 1500 -j 4 all [2021-10-04 06:24 INFO] VirSorter 2.2.3 [2021-10-04 06:24 INFO] /opt/conda/envs/vs2/bin/virsorter config --init-source --db-dir=./db [2021-10-04 06:24 INFO] Attention: can not write template-config.yaml in source directory: /opt/conda/envs/vs2/lib/python3.8/site-packages/virsorter makeing a copy to user home direcotry: /home/user/.virsorter/template-config.yaml [2021-10-04 06:24 INFO] Using {template} as config template [2021-10-04 06:24 INFO] saving /gs7/users/user/VirSorter2/db as DBDIR to config file /home/user/.virsorter/template-config.yaml [user@cn0864 VirSorter2]$ ls db/conda_envs [userga@cn0864 VirSorter2]$ vs2 run -w test.out -i test-for-sop.fa --min-length 1500 -j 4 all [2021-10-04 06:24 INFO] VirSorter 2.2.3 [2021-10-04 06:24 INFO] /opt/conda/envs/vs2/bin/virsorter run -w test.out -i test-for-sop.fa --min-length 1500 -j 4 all [2021-10-04 06:24 INFO] Using /home/user/.virsorter/template-config.yaml as config template [2021-10-04 06:24 INFO] conig file written to /gs7/users/user/VirSorter2/test.out/config.yaml [2021-10-04 06:24 INFO] Executing: snakemake --snakefile /opt/conda/envs/vs2/lib/python3.8/site-packages/virsorter/Snakefile --directory /gs7/users/user/VirSorter2/test.out --jobs 4 --configfile /gs7/users/user/VirSorter2/test.out/config.yaml --latency-wait 600 --rerun-incomplete --nolock --conda-frontend mamba --conda-prefix /gs7/users/user/VirSorter2/db/conda_envs --use-conda --quiet all Job counts: count jobs 1 all 1 check_point_for_reclassify 1 circular_linear_split 1 classify 2 classify_by_group 2 classify_full_and_part_by_group 1 combine_linear_circular 2 combine_linear_circular_by_group 1 extract_feature 1 extract_provirus_seqs 1 finalize 1 gff_feature 2 gff_feature_by_group 2 hmm_features_by_group 1 hmm_sort_to_best_hit_taxon 2 hmm_sort_to_best_hit_taxon_by_group 1 merge_classification 1 merge_full_and_part_classification 2 merge_hmm_gff_features_by_group 2 merge_provirus_call_by_group_by_split 1 merge_provirus_call_from_groups 5 merge_split_hmmtbl 10 merge_split_hmmtbl_by_group 10 merge_split_hmmtbl_by_group_tmp 1 pick_viral_fullseq 1 preprocess 1 split_faa 2 split_faa_by_group 2 split_gff_by_group 61 [2021-10-04 06:26 INFO] # of seqs < 1500 bp and removed: 0 [2021-10-04 06:26 INFO] # of circular seqs: 0 [2021-10-04 06:26 INFO] # of linear seqs : 7 [2021-10-04 06:26 INFO] No circular seqs found in contig file [2021-10-04 06:26 INFO] Finish spliting linear contig file with common rbs [2021-10-04 06:26 INFO] Step 1 - preprocess finished. [2021-10-04 06:32 INFO] Step 2 - extract-feature finished. [2021-10-04 06:33 INFO] ====> VirSorter run (provirus mode) finished. # of full seqs (>=2 genes) as viral: 6 # of partial seqs (>=2 genes) as viral: 1 # of short seqs (< 2 genes) as viral: 0 Useful output files: final-viral-score.tsv ==> score table final-viral-combined.fa ==> all viral seqs final-viral-boundary.tsv ==> table with boundary info Suffix is added to seq names in final-viral-combined.fa: full seqs (>=2 genes) as viral: ||full partial seqs (>=2 genes) as viral: ||partial short seqs (< 2 genes) as viral: ||lt2gene NOTES: Users can further screen the results based on the following columns in final-viral-score.tsv: - contig length (length) - hallmark gene count (hallmark) - viral gene % (viral) - cellular gene % (cellular) The group field in final-viral-score.tsv should NOT be used as reliable taxonomy info <==== [2021-10-04 06:33 INFO] Step 3 - classify finished.
Create a batch input file (e.g. virsorter2.sh). For example:
#!/bin/bash set -e module load virsorter2 cp $VS2_DATA/* . vs2 run -w test.out -i test-for-sop.fa --min-length 1500 -j 4 all
Submit this job using the Slurm sbatch command.
sbatch virsorter2.sh