SignatureAnalyzer is a tool for the identification of somatic mutational signatures using Bayesian non-negative matrix factorization (NMF).
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=12g -c8 --grep=lscratch:20
[user@cig 3335 ~]$ module load SignatureAnalyzer
[+] Loading singularity 4.0.1 on cn3335
[+] Loading SignatureAnalyzer 20240208
[user@cn3335 ~]$ signatureanalyzer -h
usage: signatureanalyzer [-h] [-t {maf,spectra,matrix}] [-n NRUNS] [-o OUTDIR]
[--reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,polymerase_msi,polymerase_msi96}]
[--hg_build HG_BUILD] [--cuda_int CUDA_INT] [--verbose] [--K0 K0] [--max_iter MAX_ITER] [--del_ DEL_]
[--tolerance TOLERANCE] [--phi PHI] [--a A] [--b B] [--objective {poisson,gaussian}] [--prior_on_W {L1,L2}]
[--prior_on_H {L1,L2}] [--report_freq REPORT_FREQ] [--active_thresh ACTIVE_THRESH]
[--random_seed RANDOM_SEED] [--cut_norm CUT_NORM] [--cut_diff CUT_DIFF]
input
Signature Analyzer GPU.
positional arguments:
input Input matrix for decomposition. Signature Analyzer uses the format of (samples x features)
Assumes input is a .maf by default and will compute the 96-base context spectra if not provided
* Use {-type} to specific different input types
options:
-h, --help show this help message and exit
-t {maf,spectra,matrix}, --type {maf,spectra,matrix}
Input type. Specify whether input is a .maf, a 96 base context spectra, or an RNA expression matrix (default: 'maf')
* NOTE: for expression is is reccomended to use log-transformed & gaussian {--objective} function
-n NRUNS, --nruns NRUNS
Number of iterations to run ARD-NMF. Significant speed up if GPU is available (default: 10)
-o OUTDIR, --outdir OUTDIR
Directory to save outputs (default: '.')
--reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,polymerase_msi,polymerase_msi96}
Cosmic signatures to map to and provide results for. Support for Cosmic 2 & 3 (default: 'cosmic2')
* Reference: https://cancer.sanger.ac.uk/cosmic/signatures
--hg_build HG_BUILD Path to 2bit, human genome build for mapping mutational contexts. Required if mutational context is not provided (default: None)
--cuda_int CUDA_INT GPU to use. Defaults to (cuda: 0). If (None), or if no GPU is available, will default to CPU (default: 0)
--verbose Verbosity
--K0 K0 Initial K0 parameter. If not provided, ARD-NMF starts with K0 = no. features
--max_iter MAX_ITER Maximum number of iterations for CAVI algorithm if not reached {--tolerance} (default: 10000)
--del_ DEL_ Early stop condition based on lambda change (default: 1)
--tolerance TOLERANCE
Early stop condition based on max lambda entry (default: 1e-10)
--phi PHI Dispersion parameter for CAVI. See paper for details on selection (default: 1).
* NOTE: If using gaussian {--objective}, scaled by the variance of input matrix
--a A Hyperparamter for lambda. We recommend trying various values of a. Smaller values will result in
sparser results. Reccommended starting hyperparameter: a = log(F+N). (default: 10.0)
--b B Hyperparamter for lambda. Default is computed automatically as speicified by Tan and Fevotte 2013
--objective {poisson,gaussian}
Objective function for ARD-NMF. (default: 'poisson')
* mutational signatures --> poisson (DEFAULT)
* log-norm expression --> gaussian
--prior_on_W {L1,L2} Prior on W matrix L1 (exponential) or L2 (half-normal) (default: 'L1')
--prior_on_H {L1,L2} Prior on H matrix L1 (exponential) or L2 (half-normal) (default: 'L1')
--report_freq REPORT_FREQ
Number of iterations between progress reports (default: 250)
--active_thresh ACTIVE_THRESH
Active threshold for consdiering a threshold relevant (default: 0.01)
--random_seed RANDOM_SEED
Random seed for decomposition
--cut_norm CUT_NORM Min normalized value for mean signature. Used in marker selection during post-processing (matrix). (default: 0.5)
--cut_diff CUT_DIFF Difference between mean selected signature and mean unselected signatures for marker selection (matrix). (default: 1.0)
[user@cn3335 ~]$ git clone https://github.com/getzlab/SignatureAnalyzer
[user@cn3335 ~]$ signatureanalyzer \
getzlab-SignatureAnalyzer/examples/example_luad_spectra_1.tsv \
--type spectra \
--nruns 2 \
--max_iter 100 \
--outdir .
---------------------------------------------------------
---------- S I G N A T U R E A N A L Y Z E R ----------
---------------------------------------------------------
* Using cosmic2 signatures
* Saving ARD-NMF outputs to ./nmf_output.h5
* Running ARD-NMF...
0/1: nit= 100 K=11 del=0.10181770
1/1: nit= 100 K=12 del=0.01765841
* Run 0 had lowest objective with mode (n=1) K = 11.
* Saving report plots to .
connect localhost port 6000: Connection refused
Plotting cosmic2 Attributions Barplot:
Plotting K Histogram:
Plotting cosmic2 Cosine Similarity:
[user@cn3335 ~]$ ls *pdf
cosine_similarity_plot.pdf k_dist.pdf signature_contributions.pdf signature_stacked_barplot.pdf
[user@cn3335 ~]$ git clone https://github.com/broadinstitute/SignatureAnalyzer-GPU SignatureAnalyzerGPU
[user@cn3335 ~]$ sed -i 's|.NMF_functions|NMF_functions|g' SignatureAnalyzerGPU/ARD_NMF.py
[user@cn3335 ~]$ shell
Singularity> python SignatureAnalyzerGPU/SignatureAnalyzer-GPU.py \
--data SignatureAnalyzerGPU/example_data/POLEMSI_counts_matrix.txt \
--prior_on_W L1 --prior_on_H L2 \
--output_dir . \
--parameters_file SignatureAnalyzerGPU/example_data/POLEMSI_params.txt \
--max_iter 20000 \
--labeled \
--tolerance 1e-7
Reading data frame from SignatureAnalyzerGPU/example_data/POLEMSI_counts_matrix.txt
NMF class initalized.
0
NMF data and parameters set.
* Using GPU: cuda:0
%%%%%%%%%%%%%%%
a = 10
b = 0.00029431690916802025
%%%%%%%%%%%%%%%
nit= 0 K= 96 | obj=349976.25 b_div=532412.19 lam=12.45 del=0.99999708 sumW=22351.09 sumH=3212.82
nit= 100 K= 13 | obj=-2084853.38 b_div=138738.62 lam=4.56 del=0.14132172 sumW=5573.62 sumH=1549.88
nit= 200 K= 10 | obj=-2149350.25 b_div=142379.19 lam=4.43 del=0.00341199 sumW=5411.34 sumH=1310.39
nit= 300 K= 10 | obj=-2152107.50 b_div=143538.61 lam=4.32 del=0.20345686 sumW=5264.86 sumH=1300.05
nit= 400 K= 9 | obj=-2173389.25 b_div=143575.67 lam=4.31 del=0.00034595 sumW=5238.82 sumH=1251.14
nit= 500 K= 9 | obj=-2173472.00 b_div=143554.73 lam=4.31 del=0.00022273 sumW=5232.94 sumH=1250.67
nit= 600 K= 9 | obj=-2173527.00 b_div=143521.78 lam=4.31 del=0.00028199 sumW=5231.32 sumH=1249.95
nit= 700 K= 9 | obj=-2173554.50 b_div=143527.19 lam=4.31 del=0.00013176 sumW=5229.51 sumH=1249.37
nit= 800 K= 9 | obj=-2173565.25 b_div=143532.72 lam=4.31 del=0.00005945 sumW=5227.17 sumH=1249.29
nit= 900 K= 9 | obj=-2173570.50 b_div=143537.41 lam=4.31 del=0.00002979 sumW=5226.03 sumH=1249.15
nit= 1000 K= 9 | obj=-2173573.50 b_div=143539.06 lam=4.31 del=0.00002086 sumW=5225.63 sumH=1248.88
nit= 1100 K= 9 | obj=-2173575.50 b_div=143540.06 lam=4.31 del=0.00001453 sumW=5225.83 sumH=1248.56
nit= 1200 K= 9 | obj=-2173576.50 b_div=143541.23 lam=4.31 del=0.00001465 sumW=5226.21 sumH=1248.27
nit= 1300 K= 9 | obj=-2173578.25 b_div=143541.31 lam=4.31 del=0.00001763 sumW=5226.56 sumH=1248.04
nit= 1400 K= 9 | obj=-2173579.50 b_div=143539.91 lam=4.31 del=0.00001517 sumW=5227.53 sumH=1247.71
nit= 1500 K= 9 | obj=-2173580.00 b_div=143538.98 lam=4.31 del=0.00001028 sumW=5228.32 sumH=1247.49
...
[user@cn3335 ~]$ exit
user@biowulf]$