High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
gistic on Biowulf & Helix

Program gistic (Broad Institute Cancer Publications) The GISTIC module identifies regions of the genome that are significantly amplified or deleted across a set of samples. Each aberration is assigned a G-score that considers the amplitude of the aberration as well as the frequency of its occurrence across samples.

Program Location

You can add gistic to your Modules environment with the "module" command, as in the example:

[user@biowulf]$ module load gistic           (load the current version)
[user@biowulf]$ module list                  (see what version is loaded)
Currently Loaded Modulefiles:
  1) gistic/2.0.22

Gistic Example

The gistic release comes with an example program, "run_gistic_example", adapted below. You can view the unmodified example script of the current version of gistic by running less $(which run_gistic_example) after having loaded the gistic module.

## run example GISTIC analysis

## output directory
echo --- creating output directory ---
mkdir -p $basedir 

echo --- running GISTIC ---
## input file definitions
## call script that sets MCR environment and calls GISTIC executable 
gistic2 -b $basedir -seg $segfile -mk $markersfile -refgene $refgenefile -alf $alf -cnv $cnvfile -genegistic 1 -smallmem 1 -broad 1 -brlen 0.5 -conf 0.90 -armpeel 1 -savegene 1 -gcm extreme

Example Input Files

Figure 1. gistic Example Input Files
Click grey link to view a file.

  -rwxr-xr-x 1 gistic gistic    1526 Jan 31  2014 arraylistfile.txt
  -rwxr-xr-x 1 gistic gistic  388260 Jan 31  2014 cnvfile.txt
  -rwxr-xr-x 1 gistic gistic 2962077 Jan 31  2014 markersfile.txt
  -rwxr-xr-x 1 gistic gistic  936633 Jan 31  2014 segmentationfile.txt

Note that the fifth input file — one of "refgenefiles/hg16.mat", "refgenefiles/hg17.mat", "refgenefiles/hg18.mat", "refgenefiles/hg19.mat" — is supplied by the gistic release and is formatted as a Matlab data file.

Running gistic example

Run program "run_gistic_example" as follows:

  1. Run an interactive cluster node job
  2. Change your working directory to any directory, "MyDir", into which you want to write the example results.
  3. load the environment module for "gistic"
  4. The example program, creates the directory, "MyDir/example_results", and writes output files into it.

[user@biowulf]$ sinteractive
salloc.exe: Granted job allocation 1069233
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0032 are ready for job
[user@biowulf]$ cd MyDir
[user@biowulf]$ module load gistic
[user@biowulf]$ run_gistic_example

Example Output Files

Directory, "MyDir/example_results", now contains the files shown in Figure 2..

Figure 2. Files in directory "example_results/"
Click grey link to view a file.
Files named with common prefix hold data for same graph.

  -rw-r--r-- 1 gistic gistic 12289865 Aug  3 14:31 all_data_by_genes.txt
  -rw-r--r-- 1 gistic gistic    46891 Aug  3 14:28 all_lesions.conf_90.txt
  -rw-r--r-- 1 gistic gistic  4336570 Aug  3 14:35 all_thresholded.by_genes.txt
  -rw-r--r-- 1 gistic gistic     1346 Aug  3 14:28 amp_genes.conf_90.txt
  -rw-r--r-- 1 gistic gistic    24816 Aug  3 14:28 amp_qplot.pdf
  -rw-r--r-- 1 gistic gistic    12530 Aug  3 14:28 amp_qplot.png
  -rw-r--r-- 1 gistic gistic     1525 Aug  3 14:25 arraylistfile.txt
  -rw-r--r-- 1 gistic gistic 11610871 Aug  3 14:34 broad_data_by_genes.txt
  -rw-r--r-- 1 gistic gistic     1682 Aug  3 14:29 broad_significance_results.txt
  -rw-r--r-- 1 gistic gistic    26799 Aug  3 14:30 broad_values_by_arm.txt
  -rw-r--r-- 1 gistic gistic  1051570 Aug  3 14:25 D.cap1.5.mat
  -rw-r--r-- 1 gistic gistic    34490 Aug  3 14:28 del_genes.conf_90.txt
  -rw-r--r-- 1 gistic gistic    28918 Aug  3 14:28 del_qplot.pdf
  -rw-r--r-- 1 gistic gistic    16939 Aug  3 14:28 del_qplot.png
  -rw-r--r-- 1 gistic gistic   108191 Aug  3 14:25 focal_dat.0.5.mat
  -rw-r--r-- 1 gistic gistic 11412321 Aug  3 14:32 focal_data_by_genes.txt
  -rw-r--r-- 1 gistic gistic     4150 Aug  3 14:29 freqarms_vs_ngenes.pdf
  -rw-r--r-- 1 gistic gistic  2171799 Aug  3 14:24 gistic_inputs.mat
  -rw-r--r-- 1 gistic gistic    18772 Aug  5 10:33 output_log.txt
  -rw-r--r-- 1 gistic gistic    33397 Aug  3 14:28 peak_regs.mat
  -rw-r--r-- 1 gistic gistic   660602 Aug  3 14:28 perm_ads.mat
  -rw-r--r-- 1 gistic gistic   171317 Aug  3 14:28 raw_copy_number.pdf
  -rw-r--r-- 1 gistic gistic    32205 Aug  3 14:28 raw_copy_number.png
  -rw-r--r-- 1 gistic gistic      953 Aug  3 14:28 regions_track.conf_90.bed
  -rw-r--r-- 1 gistic gistic    19015 Jul 20 14:35 run_gistic_example.log
  -rw-r--r-- 1 gistic gistic     3466 Aug  3 14:34 sample_cutoffs.txt
  -rw-r--r-- 1 gistic gistic     2186 Aug  3 14:25 sample_seg_counts.txt
  -rw-r--r-- 1 gistic gistic    92408 Aug  3 14:25 scores.0.5.mat
  -rw-r--r-- 1 gistic gistic   216583 Aug  3 14:28 scores.gistic
  -rw-r--r-- 1 gistic gistic    26324 Aug  3 14:28 wide_peak_regs.mat

Using a gistic program on the biowulf cluster

When running a gistic command, on the biowulf cluster, you must have already put the gistic environment in place by running the command "module load gistic". In particular, this means that

  1. For an interactive node session, you must run "module load gistic" before attempting any gistic commands.
  2. For a single gistic batch job, you must include "module load gistic" in your qsub script before any line running a gistic job.
  3. For a swarm of gistic jobs, you must include "module load gistic" in your swarm command file before any line running a gistic job.

Running a single Gistic batch job on Biowulf

(See the section of the same name for application samtools).

Running a swarm of Gistic jobs

(See the section of the same name for application samtools).

For more information regarding running swarm, see swarm.html

Running an interactive Gistic job on Biowulf

(See the section of the same name for application samtools).