SignatureAnalyzer is a tool for the identification of somatic mutational signatures using Bayesian non-negative matrix factorization (NMF).
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=12g -c8 --grep=lscratch:20 [user@cig 3335 ~]$ module load SignatureAnalyzer [+] Loading singularity 4.0.1 on cn3335 [+] Loading SignatureAnalyzer 20240208 [user@cn3335 ~]$ signatureanalyzer -h usage: signatureanalyzer [-h] [-t {maf,spectra,matrix}] [-n NRUNS] [-o OUTDIR] [--reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,polymerase_msi,polymerase_msi96}] [--hg_build HG_BUILD] [--cuda_int CUDA_INT] [--verbose] [--K0 K0] [--max_iter MAX_ITER] [--del_ DEL_] [--tolerance TOLERANCE] [--phi PHI] [--a A] [--b B] [--objective {poisson,gaussian}] [--prior_on_W {L1,L2}] [--prior_on_H {L1,L2}] [--report_freq REPORT_FREQ] [--active_thresh ACTIVE_THRESH] [--random_seed RANDOM_SEED] [--cut_norm CUT_NORM] [--cut_diff CUT_DIFF] input Signature Analyzer GPU. positional arguments: input Input matrix for decomposition. Signature Analyzer uses the format of (samples x features) Assumes input is a .maf by default and will compute the 96-base context spectra if not provided * Use {-type} to specific different input types options: -h, --help show this help message and exit -t {maf,spectra,matrix}, --type {maf,spectra,matrix} Input type. Specify whether input is a .maf, a 96 base context spectra, or an RNA expression matrix (default: 'maf') * NOTE: for expression is is reccomended to use log-transformed & gaussian {--objective} function -n NRUNS, --nruns NRUNS Number of iterations to run ARD-NMF. Significant speed up if GPU is available (default: 10) -o OUTDIR, --outdir OUTDIR Directory to save outputs (default: '.') --reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,polymerase_msi,polymerase_msi96} Cosmic signatures to map to and provide results for. Support for Cosmic 2 & 3 (default: 'cosmic2') * Reference: https://cancer.sanger.ac.uk/cosmic/signatures --hg_build HG_BUILD Path to 2bit, human genome build for mapping mutational contexts. Required if mutational context is not provided (default: None) --cuda_int CUDA_INT GPU to use. Defaults to (cuda: 0). If (None), or if no GPU is available, will default to CPU (default: 0) --verbose Verbosity --K0 K0 Initial K0 parameter. If not provided, ARD-NMF starts with K0 = no. features --max_iter MAX_ITER Maximum number of iterations for CAVI algorithm if not reached {--tolerance} (default: 10000) --del_ DEL_ Early stop condition based on lambda change (default: 1) --tolerance TOLERANCE Early stop condition based on max lambda entry (default: 1e-10) --phi PHI Dispersion parameter for CAVI. See paper for details on selection (default: 1). * NOTE: If using gaussian {--objective}, scaled by the variance of input matrix --a A Hyperparamter for lambda. We recommend trying various values of a. Smaller values will result in sparser results. Reccommended starting hyperparameter: a = log(F+N). (default: 10.0) --b B Hyperparamter for lambda. Default is computed automatically as speicified by Tan and Fevotte 2013 --objective {poisson,gaussian} Objective function for ARD-NMF. (default: 'poisson') * mutational signatures --> poisson (DEFAULT) * log-norm expression --> gaussian --prior_on_W {L1,L2} Prior on W matrix L1 (exponential) or L2 (half-normal) (default: 'L1') --prior_on_H {L1,L2} Prior on H matrix L1 (exponential) or L2 (half-normal) (default: 'L1') --report_freq REPORT_FREQ Number of iterations between progress reports (default: 250) --active_thresh ACTIVE_THRESH Active threshold for consdiering a threshold relevant (default: 0.01) --random_seed RANDOM_SEED Random seed for decomposition --cut_norm CUT_NORM Min normalized value for mean signature. Used in marker selection during post-processing (matrix). (default: 0.5) --cut_diff CUT_DIFF Difference between mean selected signature and mean unselected signatures for marker selection (matrix). (default: 1.0) [user@cn3335 ~]$ git clone https://github.com/getzlab/SignatureAnalyzer [user@cn3335 ~]$ signatureanalyzer \ getzlab-SignatureAnalyzer/examples/example_luad_spectra_1.tsv \ --type spectra \ --nruns 2 \ --max_iter 100 \ --outdir . --------------------------------------------------------- ---------- S I G N A T U R E A N A L Y Z E R ---------- --------------------------------------------------------- * Using cosmic2 signatures * Saving ARD-NMF outputs to ./nmf_output.h5 * Running ARD-NMF... 0/1: nit= 100 K=11 del=0.10181770 1/1: nit= 100 K=12 del=0.01765841 * Run 0 had lowest objective with mode (n=1) K = 11. * Saving report plots to . connect localhost port 6000: Connection refused Plotting cosmic2 Attributions Barplot: Plotting K Histogram: Plotting cosmic2 Cosine Similarity: [user@cn3335 ~]$ ls *pdf cosine_similarity_plot.pdf k_dist.pdf signature_contributions.pdf signature_stacked_barplot.pdf [user@cn3335 ~]$ git clone https://github.com/broadinstitute/SignatureAnalyzer-GPU SignatureAnalyzerGPU [user@cn3335 ~]$ sed -i 's|.NMF_functions|NMF_functions|g' SignatureAnalyzerGPU/ARD_NMF.py [user@cn3335 ~]$ shell Singularity> python SignatureAnalyzerGPU/SignatureAnalyzer-GPU.py \ --data SignatureAnalyzerGPU/example_data/POLEMSI_counts_matrix.txt \ --prior_on_W L1 --prior_on_H L2 \ --output_dir . \ --parameters_file SignatureAnalyzerGPU/example_data/POLEMSI_params.txt \ --max_iter 20000 \ --labeled \ --tolerance 1e-7 Reading data frame from SignatureAnalyzerGPU/example_data/POLEMSI_counts_matrix.txt NMF class initalized. 0 NMF data and parameters set. * Using GPU: cuda:0 %%%%%%%%%%%%%%% a = 10 b = 0.00029431690916802025 %%%%%%%%%%%%%%% nit= 0 K= 96 | obj=349976.25 b_div=532412.19 lam=12.45 del=0.99999708 sumW=22351.09 sumH=3212.82 nit= 100 K= 13 | obj=-2084853.38 b_div=138738.62 lam=4.56 del=0.14132172 sumW=5573.62 sumH=1549.88 nit= 200 K= 10 | obj=-2149350.25 b_div=142379.19 lam=4.43 del=0.00341199 sumW=5411.34 sumH=1310.39 nit= 300 K= 10 | obj=-2152107.50 b_div=143538.61 lam=4.32 del=0.20345686 sumW=5264.86 sumH=1300.05 nit= 400 K= 9 | obj=-2173389.25 b_div=143575.67 lam=4.31 del=0.00034595 sumW=5238.82 sumH=1251.14 nit= 500 K= 9 | obj=-2173472.00 b_div=143554.73 lam=4.31 del=0.00022273 sumW=5232.94 sumH=1250.67 nit= 600 K= 9 | obj=-2173527.00 b_div=143521.78 lam=4.31 del=0.00028199 sumW=5231.32 sumH=1249.95 nit= 700 K= 9 | obj=-2173554.50 b_div=143527.19 lam=4.31 del=0.00013176 sumW=5229.51 sumH=1249.37 nit= 800 K= 9 | obj=-2173565.25 b_div=143532.72 lam=4.31 del=0.00005945 sumW=5227.17 sumH=1249.29 nit= 900 K= 9 | obj=-2173570.50 b_div=143537.41 lam=4.31 del=0.00002979 sumW=5226.03 sumH=1249.15 nit= 1000 K= 9 | obj=-2173573.50 b_div=143539.06 lam=4.31 del=0.00002086 sumW=5225.63 sumH=1248.88 nit= 1100 K= 9 | obj=-2173575.50 b_div=143540.06 lam=4.31 del=0.00001453 sumW=5225.83 sumH=1248.56 nit= 1200 K= 9 | obj=-2173576.50 b_div=143541.23 lam=4.31 del=0.00001465 sumW=5226.21 sumH=1248.27 nit= 1300 K= 9 | obj=-2173578.25 b_div=143541.31 lam=4.31 del=0.00001763 sumW=5226.56 sumH=1248.04 nit= 1400 K= 9 | obj=-2173579.50 b_div=143539.91 lam=4.31 del=0.00001517 sumW=5227.53 sumH=1247.71 nit= 1500 K= 9 | obj=-2173580.00 b_div=143538.98 lam=4.31 del=0.00001028 sumW=5228.32 sumH=1247.49 ... [user@cn3335 ~]$ exit user@biowulf]$