SignatureAnalyzer: identification of
somatic mutational signatures
somatic mutational signatures
SignatureAnalyzer is a tool for the identification of somatic mutational signatures using Bayesian non-negative matrix factorization (NMF).
References:
- Amaro Taylor-Weiner, François Aguet, Nicholas J. Haradhvala, Sager Gosai, Shankara Anand, Jaegil Kim,
Kristin Ardlie, Eliezer M. Van Allen and Gad Getz,
Scaling computational genomics to millions of individuals with GPUs
Genome Biology (2019) 20:228
Documentation
Important Notes
- Module Name: SignatureAnalyzer (see the modules page for more information)
- Unusual environment variables set
- SA_HOME installation directory
- SA_BIN executable directory
- SA_DATA sample data directory
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=12g -c8 --grep=lscratch:20 [user@cig 3335 ~]$ module load SignatureAnalyzer [+] Loading singularity 4.0.1 on cn3335 [+] Loading SignatureAnalyzer 20240208 [user@cn3335 ~]$ signatureanalyzer -h usage: signatureanalyzer [-h] [-t {maf,spectra,matrix}] [-n NRUNS] [-o OUTDIR] [--reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,polymerase_msi,polymerase_msi96}] [--hg_build HG_BUILD] [--cuda_int CUDA_INT] [--verbose] [--K0 K0] [--max_iter MAX_ITER] [--del_ DEL_] [--tolerance TOLERANCE] [--phi PHI] [--a A] [--b B] [--objective {poisson,gaussian}] [--prior_on_W {L1,L2}] [--prior_on_H {L1,L2}] [--report_freq REPORT_FREQ] [--active_thresh ACTIVE_THRESH] [--random_seed RANDOM_SEED] [--cut_norm CUT_NORM] [--cut_diff CUT_DIFF] input Signature Analyzer GPU. positional arguments: input Input matrix for decomposition. Signature Analyzer uses the format of (samples x features) Assumes input is a .maf by default and will compute the 96-base context spectra if not provided * Use {-type} to specific different input types options: -h, --help show this help message and exit -t {maf,spectra,matrix}, --type {maf,spectra,matrix} Input type. Specify whether input is a .maf, a 96 base context spectra, or an RNA expression matrix (default: 'maf') * NOTE: for expression is is reccomended to use log-transformed & gaussian {--objective} function -n NRUNS, --nruns NRUNS Number of iterations to run ARD-NMF. Significant speed up if GPU is available (default: 10) -o OUTDIR, --outdir OUTDIR Directory to save outputs (default: '.') --reference {cosmic2,cosmic3,cosmic3_exome,cosmic3_DBS,cosmic3_ID,cosmic3_TSB,pcawg_SBS,pcawg_COMPOSITE,pcawg_COMPOSITE96,pcawg_SBS_ID,pcawg_SBS96_ID,polymerase_msi,polymerase_msi96} Cosmic signatures to map to and provide results for. Support for Cosmic 2 & 3 (default: 'cosmic2') * Reference: https://cancer.sanger.ac.uk/cosmic/signatures --hg_build HG_BUILD Path to 2bit, human genome build for mapping mutational contexts. Required if mutational context is not provided (default: None) --cuda_int CUDA_INT GPU to use. Defaults to (cuda: 0). If (None), or if no GPU is available, will default to CPU (default: 0) --verbose Verbosity --K0 K0 Initial K0 parameter. If not provided, ARD-NMF starts with K0 = no. features --max_iter MAX_ITER Maximum number of iterations for CAVI algorithm if not reached {--tolerance} (default: 10000) --del_ DEL_ Early stop condition based on lambda change (default: 1) --tolerance TOLERANCE Early stop condition based on max lambda entry (default: 1e-10) --phi PHI Dispersion parameter for CAVI. See paper for details on selection (default: 1). * NOTE: If using gaussian {--objective}, scaled by the variance of input matrix --a A Hyperparamter for lambda. We recommend trying various values of a. Smaller values will result in sparser results. Reccommended starting hyperparameter: a = log(F+N). (default: 10.0) --b B Hyperparamter for lambda. Default is computed automatically as speicified by Tan and Fevotte 2013 --objective {poisson,gaussian} Objective function for ARD-NMF. (default: 'poisson') * mutational signatures --> poisson (DEFAULT) * log-norm expression --> gaussian --prior_on_W {L1,L2} Prior on W matrix L1 (exponential) or L2 (half-normal) (default: 'L1') --prior_on_H {L1,L2} Prior on H matrix L1 (exponential) or L2 (half-normal) (default: 'L1') --report_freq REPORT_FREQ Number of iterations between progress reports (default: 250) --active_thresh ACTIVE_THRESH Active threshold for consdiering a threshold relevant (default: 0.01) --random_seed RANDOM_SEED Random seed for decomposition --cut_norm CUT_NORM Min normalized value for mean signature. Used in marker selection during post-processing (matrix). (default: 0.5) --cut_diff CUT_DIFF Difference between mean selected signature and mean unselected signatures for marker selection (matrix). (default: 1.0) [user@cn3335 ~]$ git clone https://github.com/getzlab/SignatureAnalyzer [user@cn3335 ~]$ signatureanalyzer \ getzlab-SignatureAnalyzer/examples/example_luad_spectra_1.tsv \ --type spectra \ --nruns 2 \ --max_iter 100 \ --outdir . --------------------------------------------------------- ---------- S I G N A T U R E A N A L Y Z E R ---------- --------------------------------------------------------- * Using cosmic2 signatures * Saving ARD-NMF outputs to ./nmf_output.h5 * Running ARD-NMF... 0/1: nit= 100 K=11 del=0.10181770 1/1: nit= 100 K=12 del=0.01765841 * Run 0 had lowest objective with mode (n=1) K = 11. * Saving report plots to . connect localhost port 6000: Connection refused Plotting cosmic2 Attributions Barplot: Plotting K Histogram: Plotting cosmic2 Cosine Similarity: [user@cn3335 ~]$ ls *pdf cosine_similarity_plot.pdf k_dist.pdf signature_contributions.pdf signature_stacked_barplot.pdf [user@cn3335 ~]$ git clone https://github.com/broadinstitute/SignatureAnalyzer-GPU SignatureAnalyzerGPU [user@cn3335 ~]$ sed -i 's|.NMF_functions|NMF_functions|g' SignatureAnalyzerGPU/ARD_NMF.py [user@cn3335 ~]$ shell Singularity> python SignatureAnalyzerGPU/SignatureAnalyzer-GPU.py \ --data SignatureAnalyzerGPU/example_data/POLEMSI_counts_matrix.txt \ --prior_on_W L1 --prior_on_H L2 \ --output_dir . \ --parameters_file SignatureAnalyzerGPU/example_data/POLEMSI_params.txt \ --max_iter 20000 \ --labeled \ --tolerance 1e-7 Reading data frame from SignatureAnalyzerGPU/example_data/POLEMSI_counts_matrix.txt NMF class initalized. 0 NMF data and parameters set. * Using GPU: cuda:0 %%%%%%%%%%%%%%% a = 10 b = 0.00029431690916802025 %%%%%%%%%%%%%%% nit= 0 K= 96 | obj=349976.25 b_div=532412.19 lam=12.45 del=0.99999708 sumW=22351.09 sumH=3212.82 nit= 100 K= 13 | obj=-2084853.38 b_div=138738.62 lam=4.56 del=0.14132172 sumW=5573.62 sumH=1549.88 nit= 200 K= 10 | obj=-2149350.25 b_div=142379.19 lam=4.43 del=0.00341199 sumW=5411.34 sumH=1310.39 nit= 300 K= 10 | obj=-2152107.50 b_div=143538.61 lam=4.32 del=0.20345686 sumW=5264.86 sumH=1300.05 nit= 400 K= 9 | obj=-2173389.25 b_div=143575.67 lam=4.31 del=0.00034595 sumW=5238.82 sumH=1251.14 nit= 500 K= 9 | obj=-2173472.00 b_div=143554.73 lam=4.31 del=0.00022273 sumW=5232.94 sumH=1250.67 nit= 600 K= 9 | obj=-2173527.00 b_div=143521.78 lam=4.31 del=0.00028199 sumW=5231.32 sumH=1249.95 nit= 700 K= 9 | obj=-2173554.50 b_div=143527.19 lam=4.31 del=0.00013176 sumW=5229.51 sumH=1249.37 nit= 800 K= 9 | obj=-2173565.25 b_div=143532.72 lam=4.31 del=0.00005945 sumW=5227.17 sumH=1249.29 nit= 900 K= 9 | obj=-2173570.50 b_div=143537.41 lam=4.31 del=0.00002979 sumW=5226.03 sumH=1249.15 nit= 1000 K= 9 | obj=-2173573.50 b_div=143539.06 lam=4.31 del=0.00002086 sumW=5225.63 sumH=1248.88 nit= 1100 K= 9 | obj=-2173575.50 b_div=143540.06 lam=4.31 del=0.00001453 sumW=5225.83 sumH=1248.56 nit= 1200 K= 9 | obj=-2173576.50 b_div=143541.23 lam=4.31 del=0.00001465 sumW=5226.21 sumH=1248.27 nit= 1300 K= 9 | obj=-2173578.25 b_div=143541.31 lam=4.31 del=0.00001763 sumW=5226.56 sumH=1248.04 nit= 1400 K= 9 | obj=-2173579.50 b_div=143539.91 lam=4.31 del=0.00001517 sumW=5227.53 sumH=1247.71 nit= 1500 K= 9 | obj=-2173580.00 b_div=143538.98 lam=4.31 del=0.00001028 sumW=5228.32 sumH=1247.49 ... [user@cn3335 ~]$ exit user@biowulf]$