SignatureAnalyzer: identification of
somatic mutational signatures

SignatureAnalyzer is a tool for the identification of somatic mutational signatures using Bayesian non-negative matrix factorization (NMF).

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=12g -c8 --grep=lscratch:20
[user@cig 3335 ~]$ module load SignatureAnalyzer
[+] Loading singularity  4.0.1  on cn3335
[+] Loading SignatureAnalyzer  20240208
[user@cn3335 ~]$ signatureanalyzer -h
usage: signatureanalyzer [-h] [-t {maf,spectra,matrix}] [-n NRUNS] [-o OUTDIR]
                         [--reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,polymerase_msi,polymerase_msi96}]
                         [--hg_build HG_BUILD] [--cuda_int CUDA_INT] [--verbose] [--K0 K0] [--max_iter MAX_ITER] [--del_ DEL_]
                         [--tolerance TOLERANCE] [--phi PHI] [--a A] [--b B] [--objective {poisson,gaussian}] [--prior_on_W {L1,L2}]
                         [--prior_on_H {L1,L2}] [--report_freq REPORT_FREQ] [--active_thresh ACTIVE_THRESH]
                         [--random_seed RANDOM_SEED] [--cut_norm CUT_NORM] [--cut_diff CUT_DIFF]
                         input

Signature Analyzer GPU.

positional arguments:
  input                 Input matrix for decomposition. Signature Analyzer uses the format of (samples x features)
                        Assumes input is a .maf by default and will compute the 96-base context spectra if not provided
                          * Use {-type} to specific different input types

options:
  -h, --help            show this help message and exit
  -t {maf,spectra,matrix}, --type {maf,spectra,matrix}
                        Input type. Specify whether input is a .maf, a 96 base context spectra, or an RNA expression matrix (default: 'maf')
                          * NOTE: for expression is is reccomended to use log-transformed & gaussian {--objective} function
  -n NRUNS, --nruns NRUNS
                        Number of iterations to run ARD-NMF. Significant speed up if GPU is available (default: 10)
  -o OUTDIR, --outdir OUTDIR
                        Directory to save outputs (default: '.')
  --reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,polymerase_msi,polymerase_msi96}
                        Cosmic signatures to map to and provide results for. Support for Cosmic 2 & 3 (default: 'cosmic2')
                          * Reference: https://cancer.sanger.ac.uk/cosmic/signatures
  --hg_build HG_BUILD   Path to 2bit, human genome build for mapping mutational contexts. Required if mutational context is not provided (default: None)
  --cuda_int CUDA_INT   GPU to use. Defaults to (cuda: 0). If (None), or if no GPU is available, will default to CPU (default: 0)
  --verbose             Verbosity
  --K0 K0               Initial K0 parameter. If not provided, ARD-NMF starts with K0 = no. features
  --max_iter MAX_ITER   Maximum number of iterations for CAVI algorithm if not reached {--tolerance} (default: 10000)
  --del_ DEL_           Early stop condition based on lambda change (default: 1)
  --tolerance TOLERANCE
                        Early stop condition based on max lambda entry (default: 1e-10)
  --phi PHI             Dispersion parameter for CAVI. See paper for details on selection (default: 1).
                           * NOTE: If using gaussian {--objective}, scaled by the variance of input matrix
  --a A                 Hyperparamter for lambda. We recommend trying various values of a. Smaller values will result in
                        sparser results. Reccommended starting hyperparameter: a = log(F+N). (default: 10.0)
  --b B                 Hyperparamter for lambda. Default is computed automatically as speicified by Tan and Fevotte 2013
  --objective {poisson,gaussian}
                        Objective function for ARD-NMF. (default: 'poisson')
                          * mutational signatures --> poisson (DEFAULT)
                          * log-norm expression   --> gaussian
  --prior_on_W {L1,L2}  Prior on W matrix L1 (exponential) or L2 (half-normal) (default: 'L1')
  --prior_on_H {L1,L2}  Prior on H matrix L1 (exponential) or L2 (half-normal) (default: 'L1')
  --report_freq REPORT_FREQ
                        Number of iterations between progress reports (default: 250)
  --active_thresh ACTIVE_THRESH
                        Active threshold for consdiering a threshold relevant (default: 0.01)
  --random_seed RANDOM_SEED
                        Random seed for decomposition
  --cut_norm CUT_NORM   Min normalized value for mean signature. Used in marker selection during post-processing (matrix). (default: 0.5)
  --cut_diff CUT_DIFF   Difference between mean selected signature and mean unselected signatures for marker selection (matrix). (default: 1.0)
[user@cn3335 ~]$ git clone https://github.com/getzlab/SignatureAnalyzer 
[user@cn3335 ~]$ signatureanalyzer \
        getzlab-SignatureAnalyzer/examples/example_luad_spectra_1.tsv \
        --type spectra \
        --nruns 2 \
        --max_iter 100 \
        --outdir . 
---------------------------------------------------------
---------- S I G N A T U R E  A N A L Y Z E R  ----------
---------------------------------------------------------
   * Using cosmic2 signatures
   * Saving ARD-NMF outputs to ./nmf_output.h5
   * Running ARD-NMF...
        0/1: nit=  100 K=11     del=0.10181770
        1/1: nit=  100 K=12     del=0.01765841
   * Run 0 had lowest objective with mode (n=1) K = 11.
   * Saving report plots to .
connect localhost port 6000: Connection refused
Plotting cosmic2 Attributions Barplot:
Plotting K Histogram:
Plotting cosmic2 Cosine Similarity:
[user@cn3335 ~]$ ls *pdf
cosine_similarity_plot.pdf  k_dist.pdf  signature_contributions.pdf  signature_stacked_barplot.pdf

[user@cn3335 ~]$ git clone https://github.com/broadinstitute/SignatureAnalyzer-GPU SignatureAnalyzerGPU
[user@cn3335 ~]$ sed -i 's|.NMF_functions|NMF_functions|g' SignatureAnalyzerGPU/ARD_NMF.py
[user@cn3335 ~]$ shell
Singularity> python SignatureAnalyzerGPU/SignatureAnalyzer-GPU.py \
         --data SignatureAnalyzerGPU/example_data/POLEMSI_counts_matrix.txt \
         --prior_on_W L1 --prior_on_H L2 \
         --output_dir . \
         --parameters_file SignatureAnalyzerGPU/example_data/POLEMSI_params.txt \
         --max_iter 20000 \
         --labeled \
         --tolerance 1e-7
Reading data frame from SignatureAnalyzerGPU/example_data/POLEMSI_counts_matrix.txt
NMF class initalized.
0
NMF data and parameters set.
   * Using GPU: cuda:0
%%%%%%%%%%%%%%%
a = 10
b = 0.00029431690916802025
%%%%%%%%%%%%%%%
nit=    0 K=   96 | obj=349976.25       b_div=532412.19 lam=12.45       del=0.99999708  sumW=22351.09   sumH=3212.82
nit=  100 K=   13 | obj=-2084853.38     b_div=138738.62 lam=4.56        del=0.14132172  sumW=5573.62    sumH=1549.88
nit=  200 K=   10 | obj=-2149350.25     b_div=142379.19 lam=4.43        del=0.00341199  sumW=5411.34    sumH=1310.39
nit=  300 K=   10 | obj=-2152107.50     b_div=143538.61 lam=4.32        del=0.20345686  sumW=5264.86    sumH=1300.05
nit=  400 K=    9 | obj=-2173389.25     b_div=143575.67 lam=4.31        del=0.00034595  sumW=5238.82    sumH=1251.14
nit=  500 K=    9 | obj=-2173472.00     b_div=143554.73 lam=4.31        del=0.00022273  sumW=5232.94    sumH=1250.67
nit=  600 K=    9 | obj=-2173527.00     b_div=143521.78 lam=4.31        del=0.00028199  sumW=5231.32    sumH=1249.95
nit=  700 K=    9 | obj=-2173554.50     b_div=143527.19 lam=4.31        del=0.00013176  sumW=5229.51    sumH=1249.37
nit=  800 K=    9 | obj=-2173565.25     b_div=143532.72 lam=4.31        del=0.00005945  sumW=5227.17    sumH=1249.29
nit=  900 K=    9 | obj=-2173570.50     b_div=143537.41 lam=4.31        del=0.00002979  sumW=5226.03    sumH=1249.15
nit= 1000 K=    9 | obj=-2173573.50     b_div=143539.06 lam=4.31        del=0.00002086  sumW=5225.63    sumH=1248.88
nit= 1100 K=    9 | obj=-2173575.50     b_div=143540.06 lam=4.31        del=0.00001453  sumW=5225.83    sumH=1248.56
nit= 1200 K=    9 | obj=-2173576.50     b_div=143541.23 lam=4.31        del=0.00001465  sumW=5226.21    sumH=1248.27
nit= 1300 K=    9 | obj=-2173578.25     b_div=143541.31 lam=4.31        del=0.00001763  sumW=5226.56    sumH=1248.04
nit= 1400 K=    9 | obj=-2173579.50     b_div=143539.91 lam=4.31        del=0.00001517  sumW=5227.53    sumH=1247.71
nit= 1500 K=    9 | obj=-2173580.00     b_div=143538.98 lam=4.31        del=0.00001028  sumW=5228.32    sumH=1247.49
...
[user@cn3335 ~]$ exit
user@biowulf]$