High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
SNP2HLA on Biowulf & Helix

SNP2HLA is a tool to impute amino acid polymorphisms and single nucleotide polymorphisms in human luekocyte antigenes (HLA) within the major histocompatibility complex (MHC) region in chromosome 6.

The unique feature of SNP2HLA is that it imputes not only the classical HLA alleles but also the amino acid sequences of those classical alleles, so that individual amino acid sites can be directly tested for association. This allows for facile amino-acid focused downstream analysis.

SNP2HLA also provides a companion package, MakeReference. This is software that builds the reference panel that can be used for SNP2HLA. This package is used for the situation that the provided reference panel is inappropriate (e.g. different populations), and the user wants to build the reference panel by him(her)self, e.g. typing the HLA alleles in a subset of individuals.

SNP2HLA is developed by Sherman Jia and Buhm Han in the labs of Soumya Raychaudhuri and Paul de Bakker at the Brigham and Women's Hospital and Harvard Medical School, and the Broad Institute.

Running on Helix

$ module load snp2hla
$ cd /data/$USER/
$ mkdir snp2hla; cp -r /usr/local/apps/snp2HLA/1.0.3/SNP2HLA/Example ./snp2hla
$ cd /data/$USER/snp2hla/Example
$ SNP2HLA.csh 1958BC HM_CEU_REF 1958BC_IMPUTED plink 2000 1000

Please Note, the above example uses '2000' in the command which will be passed into the SNP2HLA.csh, and therefore -Xmx2000m. If user needs more memory for analysis, please modify the commands in the scripts accordingly.

Running a single batch job on Biowulf

1. Create a script file similar to the lines below.

#!/bin/bash

module load snp2hla
cd /data/$USER/Examples
SNP2HLA.csh 1958BC HM_CEU_REF 1958BC_IMPUTED plink 4000 1000

2. Submit the script on biowulf:

$ sbatch jobscript

For more memory requirement (default 4gb), use --mem flag:

$ sbatch --mem=10g jobscript

Please Note, the above example uses '4000' in the command which will be passed into the SNP2HLA.csh, and therefore -Xmx4000m. If user needs more memory for analysis, please modify the commands in the scripts accordingly.

Running a swarm of jobs on Biowulf

Setup a swarm command file:

  cd /data/$USER/dir1; SNP2HLA.csh 1958BC HM_CEU_REF 1958BC_IMPUTED plink 1500 1000
  cd /data/$USER/dir2; SNP2HLA.csh 1958BC HM_CEU_REF 1958BC_IMPUTED plink 1500 1000
  cd /data/$USER/dir3; SNP2HLA.csh 1958BC HM_CEU_REF 1958BC_IMPUTED plink 1500 1000
	[......]

Submit the swarm file:

  $ swarm -f swarmfile --module snp2hla

-f: specify the swarmfile name
--module: set environmental variables for each command line in the file

To allocate more memory, use -g flag:

  $ swarm -f swarmfile -g 4 --module snp2hla

-g: allocate more memory

For more information regarding running swarm, see swarm.html

Please Note, the above example uses '1500' in the command which will be passed into the SNP2HLA.csh, and therefore -Xmx1500m. If user needs more memory for analysis, please modify the commands in the scripts accordingly.

Running an interactive job on Biowulf

It may be useful for debugging purposes to run jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf$ sinteractive 
salloc.exe: Granted job allocation 16535

cn999$ module load snp2hla
cn999$ cd /data/$USER/Examples
cn999$ SNP2HLA.csh 1958BC HM_CEU_REF 1958BC_IMPUTED plink 4000 1000
cn999$ exit
exit

biowulf$

Make sure to exit the job once finished.

If more memory is needed, use --mem flag. For example

biowulf$ sinteractive --mem=10g

Please Note, the above example uses '4000' in the command which will be passed into the SNP2HLA.csh, and therefore -Xmx4000m. If user needs more memory for analysis, please modify the commands in the scripts accordingly.

Documentation

http://www.broadinstitute.org/mpg/snp2hla/