High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
INRICH on Biowulf

Description

INRICH is a pathway analysis tool for genome wide association studies, designed for detecting enriched association signals of LD-independent genomic regions within biologically relevant gene sets.

Reference

Web sites

On Helix

First, set up the environment for INRICH like so:

[user@helix ~]$ module load inrich
[+] Loading inrich 1.1

Now run INRICH on the test data found in /usr/local/apps/inrich/TestData/. Copy the data to your own space first. In this example we will copy it to /scratch/inrich/TestData

[user@helix ~]$ mkdir /scratch/inrich
[user@helix ~]$ mkdir /scratch/inrich/TestData
[user@helix ~]$ cp /usr/local/apps/inrich/TestData/* /scratch/inrich/TestData
[user@helix ~]$ cd /scratch/inrich/TestData/
[user@helix ~]$ inrich \
	-g entrez.gene.map \
	-m hap3.snp.map \
	-t go.set \
	-a test.p.0.0001.int
----------------------------------------------------------------
INRICH v.1.0 : Thu Sep  3 16:06:12 2015
http://atgu.mgh.harvard.edu/inrich
----------------------------------------------------------------
       project-title  (-o)  :  test
        test-regions  (-a)  :  test.p.0.0001.int
           gene-list  (-g)  :  entrez.gene.map
     background-genes (-b)  :  --no background set--
          range-file  (-x)  :  --no ranges--
         target-file  (-t)  :  go.set
         compact      (-k)  :  YES
 target size-filter  (-i,j) :  2..all
    min-obs threshold (-z)  :  2
           test-type  (-1)  :  INTERVALS
       top-N-regions  (-n)  :   --all--
           kb-window  (-w)  :  0
            map-file  (-m)  :  hap3.snp.map
       match-density  (-d)  :  0.1
         pre-compute  (-c)  :  YES
         match-genes  (-e)  :  YES
      num-replicates  (-r)  :  5000
      num-bootstraps  (-q)  :  5000
         random-seed  (-f)  :  1551254250
           display-p  (-p)  :  0.05
   printPermutations  (-s)  :  NO
----------------------------------------------------------------

3090.78Mb total sequence length on 25 chromosomes
  read 1023882 map positions
  read 17529 reference genes
  read 16195 unique genes/targets, in 9834 groups
  157974 total gene/group pairs
  1562 genes/targets not found in main gene-list
  after size filters, 6582 groups remaining
  read 50 intervals
  25 test elements on gene regions
  25 non-genic test elements dropped

After merging, 16531 non-overlapping reference genes
After merging, 16297 non-overlapping genes in target sets
After merging, 20 non-overlapping intervals

1023882 SNP counts assigned
Precomputing acceptable positions 20
5000 first-pass permutations ( completed )
5000 second-pass permutations ( completed )

Total of 20 intervals tested for 6582 target sets

Proportion unplaced/self-placed interval/permutations = 0.00279


----------------------------------------------------------------
T_Size  Int_No    Empirical_P Corrected_P     Target
----------------------------------------------------------------
    69       2       0.014997    0.957409     GO:0003777 microtubule motor activity
    66       2     0.00539892    0.837632     GO:0004221 ubiquitin thiolesterase activity
    63       2      0.0155969    0.959608     GO:0005178 integrin binding
   101       2      0.0329934    0.994801     GO:0006979 response to oxidative stress
    79       2      0.0161968    0.962008     GO:0007018 microtubule-based movement
    57       2      0.0143971    0.954409     GO:0007229 integrin-mediated signaling pathway
    88       2      0.0355929    0.995001     GO:0008237 metallopeptidase activity
   105       2       0.015197    0.958208     GO:0043065 positive regulation of apoptosis
   118       2      0.0359928    0.995401     GO:0045893 positive regulation of transcription, DNA-dependent



----------------------------------------------------------------
Target_P_Threshold     Uniq_Gene_No_in_Targets      Significance
----------------------------------------------------------------
             0.001                           0                 1
              0.01                           2          0.915017
              0.05                           9           0.90122

Writing output to test.out.inrich ...
Batch job on Biowulf

The following example uses the test data in /usr/local/apps/inrich/TestData. Copy the data to your own space first. Here, we assume you have placed the data in /home/$USER/TestData. First, make a batch script like the following.

#! /bin/bash
set -e
module load inrich 
cd ~/TestData # or wherever you put your data
inrich \
	  -g entrez.gene.map \
	  -m hap3.snp.map \
	  -t go.set \
	  -a test.p.0.0001.int

and submit it to the queue with

[user@biowulf ~]$ sbatch inrich_batch.sh
Swarm of jobs on Biowulf

INRICH creates output files that are automatically named based on the name of the associated_interval_file. If you have several associated input files that are named differently, you can just create a swarm file like this one.

inrich -g 1_entrez.gene.map \
	-m 1_hap3.snp.map \
	-t 1_go.set \
	-a 1_test.p.0.0001.int
inrich -g 2_entrez.gene.map \
	-m 2_hap3.snp.map \
	-t 2_go.set \
	-a 2_test.p.0.0001.int

and submit it to the queue with

[user@biowulf ~]$ swarm --module inrich -f inrich.swarm 

If you wanted to run a swarm with multiple associated_interval_files that were all named identically, the output files may try to overwrite one another. One strategy would be to write a wrapper that would make subdirectories based on the input. Then all of the input and resulting output files can be identically named, but they will reside in identifiable locations. An example wrapper might look like

#! /bin/bash

# make a subdirectory to work in
wd=$(pwd)
subdir=$5
fulldir=${wd}/${subdir}
mkdir ${fulldir}
cd ${fulldir}

# now run inrich on the data in the parent directory
module load inrich 
inrich \
	-g ${wd}/${1} \
	-m ${wd}/${2} \
	-t ${wd}/${3} \
	-a ${wd}/${4}

then the swarm file would look something like this (assuming the wrapper is in the user's current directory along with the data)

./inrich_wrapper.sh 1_entrez.gene.map \
	1_hap3.snp.map \
	1_go.set \
	test.p.0.0001.int \
	directory_1
./inrich_wrapper.sh 2_entrez.gene.map \
	2_hap3.snp.map \
	2_go.set \
	test.p.0.0001.int \
	directory_2

and the job would be submitted like so.

[user@biowulf ~]$ swarm -f inrich_wrapper.swarm 
Interactive job on Biowulf

To run an interactive session on Biowulf, first allocate an interactive session:

[user@biowulf ~]$ sinteractive
salloc.exe: Granted job allocation 1781493
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0228 are ready for job
srun: error: x11: no local DISPLAY defined, skipping

Then follow the directions for an interactive session On Helix above.

Documentation