Biowulf High Performance Computing at the NIH
svclone on Biowulf

From the description on the GitHub Repo

This package is used to cluster structural variants of similar cancer cell fraction (CCF). SVclone is divided into five components: annotate, count, filter, cluster and post-assign. The annotate step infers directionality of each breakpoint (if not supplied), recalibrates breakpoint position to the soft-clip boundary and subsequently classifies SVs using a rule-based approach. The count step counts the variant and non-variant reads from breakpoint locations. Both the annotate and count steps utilise BAM-level information. The filter step removes SVs based on a number of adjustable parameters and prepares the variants for clustering. SNVs can also be added at this step as well as CNV information, which is matched to SV and SNV loci. Any variants that were filtered out, or left out due to sub-sampling can be added back using the post-assign step, which assigns each variant (which contains a > 0 VAF and matching copy-number state, at minimum) to the most likely cluster (obtained from the cluster step). Post-processing scripts are also included to aid in visualising the clustering results.
Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --gres=lscratch:10 --mem=10g -c4
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ ml svclone
[user@cn3144]$ cp -L $SVCLONE_TEST_DATA/* .
[user@cn3144]$ ls -lh
total 5.6M
-rw-r--r-- 1 user group   55 Jul 19 16:07 purity_ploidy.txt
-rw-r--r-- 1 user group 8.8K Jul 19 16:07 svclone_test_coclus.ini
-rw-r--r-- 1 user group 8.8K Jul 19 16:07 svclone_test.ini
-rw-r--r-- 1 user group  29K Jul 19 16:07 tumour_p80_DEL_snvs.vcf
-rw-r--r-- 1 user group 5.5M Jul 19 16:07 tumour_p80_DEL_sv_extract_sorted.bam
-rw-r--r-- 1 user group 9.2K Jul 19 16:07 tumour_p80_DEL_sv_extract_sorted.bam.bai
-rw-r--r-- 1 user group 2.2K Jul 19 16:07 tumour_p80_DEL_svs_simple.txt

[user@cn3144]$ input=tumour_p80_DEL_svs_simple.txt
[user@cn3144]$ bam=tumour_p80_DEL_sv_extract_sorted.bam
[user@cn3144]$ sample=tumour_p80_DEL

[user@cn3144]$ # Annotate break points
[user@cn3144]$ SVclone.py annotate -i $input -b $bam -s $sample --sv_format simple -cfg svclone_test.ini
Loading SV calls...
Inferring SV directions...
Classifying SVs...
Writing SV output...

[user@cn3144]$ # Count breakpoint reads
[user@cn3144]$ SVclone.py count -i ${sample}/${sample}_svin.txt -b $bam -s $sample -cfg svclone_test.ini
...
Extracting data from 100 SVs

[user@cn3144]$ # Filtering out low confidence breaks
[user@cn3144]$ SVclone.py filter -s $sample -i ${sample}/${sample}_svinfo.txt -p example_data/purity_ploidy.txt -cfg svclone_test.ini
WARNING: No purity/ploidy file found. Assuming purity = 1.000000, ploidy = 2.000000
Filtered out 0 SVs based on size limits
Filtered out 0 SVs based on minimum depth limit
Filtered out 0 SVs based on spanning/split read limits
No CNV input defined, assuming all loci major/minor allele copy-numbers are ploidy/2
Final filtered SV count: 100

[user@cn3144]$  SVclone.py cluster -s $sample -cfg svclone_test.ini
Clustering 100 SVs...
Thread 0; cluster run: 0
Thread 1; cluster run: 1
Thread 2; cluster run: 2
Thread 3; cluster run: 3
Random seed for run 0 is 4013
Random seed for run 2 is 2067
Random seed for run 1 is 678
Random seed for run 3 is 1681
phi lower limit: 0.039231; phi upper limit: 1.000000
Dirichlet concentration gamma prior values: alpha = 0.100000; beta= 0.500000; init = 0.200000
...


   clus_id  size   phi
0        0   100  0.82
...

   clus_id  size     phi
0        0    53  0.7923
1        1    47  0.8436
Compiling and writing output...
Selecting run 0 as best run for SVs

[user@cn3144]$ # post-assign variants
[user@cn3144]$  SVclone.py post_assign -s $sample --svs ${sample}/${sample}_svinfo.txt -cfg svclone_test.ini
No CNV input defined, assuming all loci major/minor allele copy-numbers are ploidy/2
Reassigning all 100 variants
Writing output to tumour_p80_DEL/best_run_svs_post_assign/ for SVs

[user@cn3144]$ # diagnostic plots
[user@cn3144]$  Rscript $SVCLONE_RSCRIPTS/post_process_fit_diagnostics.R $sample $sample --map
[1] "Finished plotting diagnostic plots!"

[user@cn3144]$ ls -lh tumour_p80_DEL
total 196K
drwxr-xr-x 2 user group 4.0K Jul 19 16:12 best_run_svs
drwxr-xr-x 2 user group 4.0K Jul 19 16:12 best_run_svs_post_assign
-rw-r--r-- 1 user group   54 Jul 19 16:10 purity_ploidy.txt
-rw-r--r-- 1 user group   80 Jul 19 16:09 read_params.txt
drwxr-xr-x 2 user group 4.0K Jul 19 16:12 run0
drwxr-xr-x 2 user group 4.0K Jul 19 16:12 run1
drwxr-xr-x 2 user group 4.0K Jul 19 16:11 run2
drwxr-xr-x 2 user group 4.0K Jul 19 16:12 run3
-rw-r--r-- 1 user group 7.9K Jul 19 16:13 tumour_p80_DEL_best_run_svs_post_assign_best_fit.pdf
-rw-r--r-- 1 user group 4.5K Jul 19 16:13 tumour_p80_DEL_cluster_hist.pdf
-rw-r--r-- 1 user group  23K Jul 19 16:12 tumour_p80_DEL_filtered_svs_post_assign.tsv
-rw-r--r-- 1 user group  22K Jul 19 16:10 tumour_p80_DEL_filtered_svs.tsv
-rw-r--r-- 1 user group  126 Jul 19 16:13 tumour_p80_DEL_ic_metrics.csv
-rw-r--r-- 1 user group 8.4K Jul 19 16:13 tumour_p80_DEL_run0_fit.pdf
-rw-r--r-- 1 user group 7.9K Jul 19 16:13 tumour_p80_DEL_run1_fit.pdf
-rw-r--r-- 1 user group 7.9K Jul 19 16:13 tumour_p80_DEL_run2_fit.pdf
-rw-r--r-- 1 user group 8.3K Jul 19 16:13 tumour_p80_DEL_run3_fit.pdf
-rw-r--r-- 1 user group  31K Jul 19 16:13 tumour_p80_DEL_run_summary.pdf
-rw-r--r-- 1 user group  16K Jul 19 16:10 tumour_p80_DEL_svinfo.txt
-rw-r--r-- 1 user group 5.1K Jul 19 16:09 tumour_p80_DEL_svin.txt
[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$

Note that the number of threads used by the clustering algorithm is set in the configuration file.

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. svclone.sh), similar to the following example which runs the example data set. Note that the number of threads is determined in the svclone_test.ini config file.

#!/bin/bash
ml svclone
cp -L $SVCLONE_TEST_DATA/* .

input=tumour_p80_DEL_svs_simple.txt
bam=tumour_p80_DEL_sv_extract_sorted.bam
sample=tumour_p80_DEL

SVclone.py annotate -i $input -b $bam -s $sample --sv_format simple -cfg svclone_test.ini
SVclone.py count -i ${sample}/${sample}_svin.txt -b $bam -s $sample -cfg svclone_test.ini
SVclone.py filter -s $sample -i ${sample}/${sample}_svinfo.txt -p example_data/purity_ploidy.txt -cfg svclone_test.ini
SVclone.py cluster -s $sample -cfg svclone_test.ini
SVclone.py post_assign -s $sample --svs ${sample}/${sample}_svinfo.txt -cfg svclone_test.ini
Rscript $SVCLONE_RSCRIPTS/post_process_fit_diagnostics.R $sample $sample --map

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=2 --mem=10g svclone.sh