svclone on Biowulf
From the description on the GitHub Repo
This package is used to cluster structural variants of similar cancer cell fraction (CCF). SVclone is divided into five components: annotate, count, filter, cluster and post-assign. The annotate step infers directionality of each breakpoint (if not supplied), recalibrates breakpoint position to the soft-clip boundary and subsequently classifies SVs using a rule-based approach. The count step counts the variant and non-variant reads from breakpoint locations. Both the annotate and count steps utilise BAM-level information. The filter step removes SVs based on a number of adjustable parameters and prepares the variants for clustering. SNVs can also be added at this step as well as CNV information, which is matched to SV and SNV loci. Any variants that were filtered out, or left out due to sub-sampling can be added back using the post-assign step, which assigns each variant (which contains a > 0 VAF and matching copy-number state, at minimum) to the most likely cluster (obtained from the cluster step). Post-processing scripts are also included to aid in visualising the clustering results.
Documentation
- SVclone GitHub repo
Important Notes
- Module Name: svclone (see the modules page for more information)
- The clustering step is multithreaded. Please make sure to match the number of threads to the number of allocated CPUs
- Example files in
$SVCLONE_TEST_DATA
- R scripts in
$SVCLONE_RSCRIPTS
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --gres=lscratch:10 --mem=10g -c4 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ ml svclone [user@cn3144]$ cp -L $SVCLONE_TEST_DATA/* . [user@cn3144]$ ls -lh total 5.6M -rw-r--r-- 1 user group 55 Jul 19 16:07 purity_ploidy.txt -rw-r--r-- 1 user group 8.8K Jul 19 16:07 svclone_test_coclus.ini -rw-r--r-- 1 user group 8.8K Jul 19 16:07 svclone_test.ini -rw-r--r-- 1 user group 29K Jul 19 16:07 tumour_p80_DEL_snvs.vcf -rw-r--r-- 1 user group 5.5M Jul 19 16:07 tumour_p80_DEL_sv_extract_sorted.bam -rw-r--r-- 1 user group 9.2K Jul 19 16:07 tumour_p80_DEL_sv_extract_sorted.bam.bai -rw-r--r-- 1 user group 2.2K Jul 19 16:07 tumour_p80_DEL_svs_simple.txt [user@cn3144]$ input=tumour_p80_DEL_svs_simple.txt [user@cn3144]$ bam=tumour_p80_DEL_sv_extract_sorted.bam [user@cn3144]$ sample=tumour_p80_DEL [user@cn3144]$ # Annotate break points [user@cn3144]$ SVclone.py annotate -i $input -b $bam -s $sample --sv_format simple -cfg svclone_test.ini Loading SV calls... Inferring SV directions... Classifying SVs... Writing SV output... [user@cn3144]$ # Count breakpoint reads [user@cn3144]$ SVclone.py count -i ${sample}/${sample}_svin.txt -b $bam -s $sample -cfg svclone_test.ini ... Extracting data from 100 SVs [user@cn3144]$ # Filtering out low confidence breaks [user@cn3144]$ SVclone.py filter -s $sample -i ${sample}/${sample}_svinfo.txt -p example_data/purity_ploidy.txt -cfg svclone_test.ini WARNING: No purity/ploidy file found. Assuming purity = 1.000000, ploidy = 2.000000 Filtered out 0 SVs based on size limits Filtered out 0 SVs based on minimum depth limit Filtered out 0 SVs based on spanning/split read limits No CNV input defined, assuming all loci major/minor allele copy-numbers are ploidy/2 Final filtered SV count: 100 [user@cn3144]$ SVclone.py cluster -s $sample -cfg svclone_test.ini Clustering 100 SVs... Thread 0; cluster run: 0 Thread 1; cluster run: 1 Thread 2; cluster run: 2 Thread 3; cluster run: 3 Random seed for run 0 is 4013 Random seed for run 2 is 2067 Random seed for run 1 is 678 Random seed for run 3 is 1681 phi lower limit: 0.039231; phi upper limit: 1.000000 Dirichlet concentration gamma prior values: alpha = 0.100000; beta= 0.500000; init = 0.200000 ... clus_id size phi 0 0 100 0.82 ... clus_id size phi 0 0 53 0.7923 1 1 47 0.8436 Compiling and writing output... Selecting run 0 as best run for SVs [user@cn3144]$ # post-assign variants [user@cn3144]$ SVclone.py post_assign -s $sample --svs ${sample}/${sample}_svinfo.txt -cfg svclone_test.ini No CNV input defined, assuming all loci major/minor allele copy-numbers are ploidy/2 Reassigning all 100 variants Writing output to tumour_p80_DEL/best_run_svs_post_assign/ for SVs [user@cn3144]$ # diagnostic plots [user@cn3144]$ Rscript $SVCLONE_RSCRIPTS/post_process_fit_diagnostics.R $sample $sample --map [1] "Finished plotting diagnostic plots!" [user@cn3144]$ ls -lh tumour_p80_DEL total 196K drwxr-xr-x 2 user group 4.0K Jul 19 16:12 best_run_svs drwxr-xr-x 2 user group 4.0K Jul 19 16:12 best_run_svs_post_assign -rw-r--r-- 1 user group 54 Jul 19 16:10 purity_ploidy.txt -rw-r--r-- 1 user group 80 Jul 19 16:09 read_params.txt drwxr-xr-x 2 user group 4.0K Jul 19 16:12 run0 drwxr-xr-x 2 user group 4.0K Jul 19 16:12 run1 drwxr-xr-x 2 user group 4.0K Jul 19 16:11 run2 drwxr-xr-x 2 user group 4.0K Jul 19 16:12 run3 -rw-r--r-- 1 user group 7.9K Jul 19 16:13 tumour_p80_DEL_best_run_svs_post_assign_best_fit.pdf -rw-r--r-- 1 user group 4.5K Jul 19 16:13 tumour_p80_DEL_cluster_hist.pdf -rw-r--r-- 1 user group 23K Jul 19 16:12 tumour_p80_DEL_filtered_svs_post_assign.tsv -rw-r--r-- 1 user group 22K Jul 19 16:10 tumour_p80_DEL_filtered_svs.tsv -rw-r--r-- 1 user group 126 Jul 19 16:13 tumour_p80_DEL_ic_metrics.csv -rw-r--r-- 1 user group 8.4K Jul 19 16:13 tumour_p80_DEL_run0_fit.pdf -rw-r--r-- 1 user group 7.9K Jul 19 16:13 tumour_p80_DEL_run1_fit.pdf -rw-r--r-- 1 user group 7.9K Jul 19 16:13 tumour_p80_DEL_run2_fit.pdf -rw-r--r-- 1 user group 8.3K Jul 19 16:13 tumour_p80_DEL_run3_fit.pdf -rw-r--r-- 1 user group 31K Jul 19 16:13 tumour_p80_DEL_run_summary.pdf -rw-r--r-- 1 user group 16K Jul 19 16:10 tumour_p80_DEL_svinfo.txt -rw-r--r-- 1 user group 5.1K Jul 19 16:09 tumour_p80_DEL_svin.txt [user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
Note that the number of threads used by the clustering algorithm is set in the configuration file.
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. svclone.sh), similar to the following example which runs the example data set. Note that the number of threads is determined in the svclone_test.ini config file.
#!/bin/bash ml svclone cp -L $SVCLONE_TEST_DATA/* . input=tumour_p80_DEL_svs_simple.txt bam=tumour_p80_DEL_sv_extract_sorted.bam sample=tumour_p80_DEL SVclone.py annotate -i $input -b $bam -s $sample --sv_format simple -cfg svclone_test.ini SVclone.py count -i ${sample}/${sample}_svin.txt -b $bam -s $sample -cfg svclone_test.ini SVclone.py filter -s $sample -i ${sample}/${sample}_svinfo.txt -p example_data/purity_ploidy.txt -cfg svclone_test.ini SVclone.py cluster -s $sample -cfg svclone_test.ini SVclone.py post_assign -s $sample --svs ${sample}/${sample}_svinfo.txt -cfg svclone_test.ini Rscript $SVCLONE_RSCRIPTS/post_process_fit_diagnostics.R $sample $sample --map
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=2 --mem=10g svclone.sh