SignatureAnalyzer: identification of
somatic mutational signatures

SignatureAnalyzer is a tool for the identification of somatic mutational signatures using Bayesian non-negative matrix factorization (NMF).


Interactive job
Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=12g -c8 --grep=lscratch:20
[user@cig 3335 ~]$ module load SignatureAnalyzer
[+] Loading singularity  4.0.1  on cn3335
[+] Loading SignatureAnalyzer  20240208
[user@cn3335 ~]$ signatureanalyzer -h
usage: signatureanalyzer [-h] [-t {maf,spectra,matrix}] [-n NRUNS] [-o OUTDIR]
                         [--reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,polymerase_msi,polymerase_msi96}]
                         [--hg_build HG_BUILD] [--cuda_int CUDA_INT] [--verbose] [--K0 K0] [--max_iter MAX_ITER] [--del_ DEL_]
                         [--tolerance TOLERANCE] [--phi PHI] [--a A] [--b B] [--objective {poisson,gaussian}] [--prior_on_W {L1,L2}]
                         [--prior_on_H {L1,L2}] [--report_freq REPORT_FREQ] [--active_thresh ACTIVE_THRESH]
                         [--random_seed RANDOM_SEED] [--cut_norm CUT_NORM] [--cut_diff CUT_DIFF]

Signature Analyzer GPU.

positional arguments:
  input                 Input matrix for decomposition. Signature Analyzer uses the format of (samples x features)
                        Assumes input is a .maf by default and will compute the 96-base context spectra if not provided
                          * Use {-type} to specific different input types

  -h, --help            show this help message and exit
  -t {maf,spectra,matrix}, --type {maf,spectra,matrix}
                        Input type. Specify whether input is a .maf, a 96 base context spectra, or an RNA expression matrix (default: 'maf')
                          * NOTE: for expression is is reccomended to use log-transformed & gaussian {--objective} function
  -n NRUNS, --nruns NRUNS
                        Number of iterations to run ARD-NMF. Significant speed up if GPU is available (default: 10)
  -o OUTDIR, --outdir OUTDIR
                        Directory to save outputs (default: '.')
  --reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,polymerase_msi,polymerase_msi96}
                        Cosmic signatures to map to and provide results for. Support for Cosmic 2 & 3 (default: 'cosmic2')
                          * Reference:
  --hg_build HG_BUILD   Path to 2bit, human genome build for mapping mutational contexts. Required if mutational context is not provided (default: None)
  --cuda_int CUDA_INT   GPU to use. Defaults to (cuda: 0). If (None), or if no GPU is available, will default to CPU (default: 0)
  --verbose             Verbosity
  --K0 K0               Initial K0 parameter. If not provided, ARD-NMF starts with K0 = no. features
  --max_iter MAX_ITER   Maximum number of iterations for CAVI algorithm if not reached {--tolerance} (default: 10000)
  --del_ DEL_           Early stop condition based on lambda change (default: 1)
  --tolerance TOLERANCE
                        Early stop condition based on max lambda entry (default: 1e-10)
  --phi PHI             Dispersion parameter for CAVI. See paper for details on selection (default: 1).
                           * NOTE: If using gaussian {--objective}, scaled by the variance of input matrix
  --a A                 Hyperparamter for lambda. We recommend trying various values of a. Smaller values will result in
                        sparser results. Reccommended starting hyperparameter: a = log(F+N). (default: 10.0)
  --b B                 Hyperparamter for lambda. Default is computed automatically as speicified by Tan and Fevotte 2013
  --objective {poisson,gaussian}
                        Objective function for ARD-NMF. (default: 'poisson')
                          * mutational signatures --> poisson (DEFAULT)
                          * log-norm expression   --> gaussian
  --prior_on_W {L1,L2}  Prior on W matrix L1 (exponential) or L2 (half-normal) (default: 'L1')
  --prior_on_H {L1,L2}  Prior on H matrix L1 (exponential) or L2 (half-normal) (default: 'L1')
  --report_freq REPORT_FREQ
                        Number of iterations between progress reports (default: 250)
  --active_thresh ACTIVE_THRESH
                        Active threshold for consdiering a threshold relevant (default: 0.01)
  --random_seed RANDOM_SEED
                        Random seed for decomposition
  --cut_norm CUT_NORM   Min normalized value for mean signature. Used in marker selection during post-processing (matrix). (default: 0.5)
  --cut_diff CUT_DIFF   Difference between mean selected signature and mean unselected signatures for marker selection (matrix). (default: 1.0)
[user@cn3335 ~]$ git clone 
[user@cn3335 ~]$ signatureanalyzer \
        getzlab-SignatureAnalyzer/examples/example_luad_spectra_1.tsv \
        --type spectra \
        --nruns 2 \
        --max_iter 100 \
        --outdir . 
---------- S I G N A T U R E  A N A L Y Z E R  ----------
   * Using cosmic2 signatures
   * Saving ARD-NMF outputs to ./nmf_output.h5
   * Running ARD-NMF...
        0/1: nit=  100 K=11     del=0.10181770
        1/1: nit=  100 K=12     del=0.01765841
   * Run 0 had lowest objective with mode (n=1) K = 11.
   * Saving report plots to .
connect localhost port 6000: Connection refused
Plotting cosmic2 Attributions Barplot:
Plotting K Histogram:
Plotting cosmic2 Cosine Similarity:
[user@cn3335 ~]$ ls *pdf
cosine_similarity_plot.pdf  k_dist.pdf  signature_contributions.pdf  signature_stacked_barplot.pdf

[user@cn3335 ~]$ git clone SignatureAnalyzerGPU
[user@cn3335 ~]$ sed -i 's|.NMF_functions|NMF_functions|g' SignatureAnalyzerGPU/
[user@cn3335 ~]$ shell
Singularity> python SignatureAnalyzerGPU/ \
         --data SignatureAnalyzerGPU/example_data/POLEMSI_counts_matrix.txt \
         --prior_on_W L1 --prior_on_H L2 \
         --output_dir . \
         --parameters_file SignatureAnalyzerGPU/example_data/POLEMSI_params.txt \
         --max_iter 20000 \
         --labeled \
         --tolerance 1e-7
Reading data frame from SignatureAnalyzerGPU/example_data/POLEMSI_counts_matrix.txt
NMF class initalized.
NMF data and parameters set.
   * Using GPU: cuda:0
a = 10
b = 0.00029431690916802025
nit=    0 K=   96 | obj=349976.25       b_div=532412.19 lam=12.45       del=0.99999708  sumW=22351.09   sumH=3212.82
nit=  100 K=   13 | obj=-2084853.38     b_div=138738.62 lam=4.56        del=0.14132172  sumW=5573.62    sumH=1549.88
nit=  200 K=   10 | obj=-2149350.25     b_div=142379.19 lam=4.43        del=0.00341199  sumW=5411.34    sumH=1310.39
nit=  300 K=   10 | obj=-2152107.50     b_div=143538.61 lam=4.32        del=0.20345686  sumW=5264.86    sumH=1300.05
nit=  400 K=    9 | obj=-2173389.25     b_div=143575.67 lam=4.31        del=0.00034595  sumW=5238.82    sumH=1251.14
nit=  500 K=    9 | obj=-2173472.00     b_div=143554.73 lam=4.31        del=0.00022273  sumW=5232.94    sumH=1250.67
nit=  600 K=    9 | obj=-2173527.00     b_div=143521.78 lam=4.31        del=0.00028199  sumW=5231.32    sumH=1249.95
nit=  700 K=    9 | obj=-2173554.50     b_div=143527.19 lam=4.31        del=0.00013176  sumW=5229.51    sumH=1249.37
nit=  800 K=    9 | obj=-2173565.25     b_div=143532.72 lam=4.31        del=0.00005945  sumW=5227.17    sumH=1249.29
nit=  900 K=    9 | obj=-2173570.50     b_div=143537.41 lam=4.31        del=0.00002979  sumW=5226.03    sumH=1249.15
nit= 1000 K=    9 | obj=-2173573.50     b_div=143539.06 lam=4.31        del=0.00002086  sumW=5225.63    sumH=1248.88
nit= 1100 K=    9 | obj=-2173575.50     b_div=143540.06 lam=4.31        del=0.00001453  sumW=5225.83    sumH=1248.56
nit= 1200 K=    9 | obj=-2173576.50     b_div=143541.23 lam=4.31        del=0.00001465  sumW=5226.21    sumH=1248.27
nit= 1300 K=    9 | obj=-2173578.25     b_div=143541.31 lam=4.31        del=0.00001763  sumW=5226.56    sumH=1248.04
nit= 1400 K=    9 | obj=-2173579.50     b_div=143539.91 lam=4.31        del=0.00001517  sumW=5227.53    sumH=1247.71
nit= 1500 K=    9 | obj=-2173580.00     b_div=143538.98 lam=4.31        del=0.00001028  sumW=5228.32    sumH=1247.49
[user@cn3335 ~]$ exit