GISTIC, or Genomic Identification of Significant Targets in Cancer, identifies regions of the genome that are significantly amplified or deleted across a set of samples. Each aberration is assigned a G-score that considers the amplitude of the aberration as well as the frequency of its occurrence across samples. False Discovery Rate q-values are then calculated for the aberrant regions, and regions with q-values below a user-defined threshold are considered significant. For each significant region, a “peak region” is identified, which is the part of the aberrant region with greatest amplitude and frequency of alteration. In addition, a “wide peak” is determined using a leave-one-out algorithm to allow for errors in the boundaries in a single sample. The “wide peak” boundaries are more robust for identifying the most likely gene targets in the region. Each significantly aberrant region is also tested to determine whether it results primarily from broad events (longer than half a chromosome arm), focal events, or significant levels of both. The GISTIC module reports the genomic locations and calculated q-values for the aberrant regions. It identifies the samples that exhibit each significant amplification or deletion, and it lists genes found in each “wide peak” region.
Allocate an interactive session and run the program. Sample session based on $GISTIC_HOME/run_gistic_example:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load gistic [user@cn3144 ~]$ cd /data/$USER [user@cn3144 user]$ basedir=$PWD/example_results [user@cn3144 user]$ mkdir $basedir [user@cn3144 user]$ segfile=$GISTIC_HOME/examplefiles/segmentationfile.txt [user@cn3144 user]$ markersfile=$GISTIC_HOME/examplefiles/markersfile.txt [user@cn3144 user]$ refgenefile=$GISTIC_HOME/refgenefiles/hg16.mat [user@cn3144 user]$ alf=$GISTIC_HOME/examplefiles/arraylistfile.txt [user@cn3144 user]$ cnvfile=$GISTIC_HOME/examplefiles/cnvfile.txt [user@cn3144 user]$ gistic2 \ -b $basedir \ -seg $segfile \ -mk $markersfile \ -refgene $refgenefile \ -alf $alf \ -cnv $cnvfile \ -genegistic 1 \ -smallmem 1 \ -broad 1 \ -brlen 0.5 \ -conf 0.90 \ -armpeel 1 \ -savegene 1 \ -gcm extreme [user@cn3144 user]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. gistic.sh). For example:
#!/bin/sh set -e module load gistic basedir=$PWD/example_results mkdir $basedir segfile=$GISTIC_HOME/examplefiles/segmentationfile.txt markersfile=$GISTIC_HOME/examplefiles/markersfile.txt refgenefile=$GISTIC_HOME/refgenefiles/hg16.mat alf=$GISTIC_HOME/examplefiles/arraylistfile.txt cnvfile=$GISTIC_HOME/examplefiles/cnvfile.txt ## call script that sets MCR environment and calls GISTIC executable gistic2 \ -b $basedir \ -seg $segfile \ -mk $markersfile \ -refgene $refgenefile \ -alf $alf \ -cnv $cnvfile \ -genegistic 1 \ -smallmem 0 \ -broad 1 \ -brlen 0.5 \ -conf 0.90 \ -armpeel 1 \ -savegene 1 \ -gcm extreme
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] gistic.sh
Create a swarmfile (e.g. gistic.swarm). For example:
cd /data/$USER/gistic/set1 \ && mkdir results \ && gistic2 -b results -seg segfile.txt -mk markersfile.txt -refgene $GISTIC_HOME/refgenefiles/hg38.UCSC.add_miR.160920.refgene.mat -alf arraylistfile.txt -cnv cnvfile.txt -genegistic 1 -smallmem 0 -broad 1 -brlen 0.5 -conf 0.90 -armpeel 1 -savegene 1 -gcm extreme cd /data/$USER/gistic/set2 \ && mkdir results \ && gistic2 -b results -seg segfile.txt -mk markersfile.txt -refgene $GISTIC_HOME/refgenefiles/hg38.UCSC.add_miR.160920.refgene.mat -alf arraylistfile.txt -cnv cnvfile.txt -genegistic 1 -smallmem 0 -broad 1 -brlen 0.5 -conf 0.90 -armpeel 1 -savegene 1 -gcm extreme cd /data/$USER/gistic/set3 \ && mkdir results \ && gistic2 -b results -seg segfile.txt -mk markersfile.txt -refgene $GISTIC_HOME/refgenefiles/hg38.UCSC.add_miR.160920.refgene.mat -alf arraylistfile.txt -cnv cnvfile.txt -genegistic 1 -smallmem 0 -broad 1 -brlen 0.5 -conf 0.90 -armpeel 1 -savegene 1 -gcm extreme cd /data/$USER/gistic/set4 \ && mkdir results \ && gistic2 -b results -seg segfile.txt -mk markersfile.txt -refgene $GISTIC_HOME/refgenefiles/hg38.UCSC.add_miR.160920.refgene.mat -alf arraylistfile.txt -cnv cnvfile.txt -genegistic 1 -smallmem 0 -broad 1 -brlen 0.5 -conf 0.90 -armpeel 1 -savegene 1 -gcm extreme
Submit this job using the swarm command.
swarm -f gistic.swarm [-g #] [-t #] --module gisticwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module gistic | Loads the GISTIC module for each subjob in the swarm |