Biowulf High Performance Computing at the NIH
HLA-PRG-LA on Biowulf

HLA*PRG:LA stands for "HLA*PRG, linear approximation". HLA*PRG:LA approximates the graph alignment process by starting with linear sequence alignments. It brings down the resource requirements per sample for the HLA typing process to 30GB RAM/30 CPU hours, and produces highly accurate calls. HLA*PRG:LA was developed by Alexander Dilthey at NHGRI. [Description of the algorithm]

Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --gres=lscratch:150 --cpus-per-task=8 --mem=100g 
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load HLA-PRG-LA

[user@cn3144 ~]$ cd /lscratch/$SLURM_JOBID

[user@cn3144 ~]$ cp /usr/local/apps/HLA-PRG-LA/testdata/  .

[user@cn3144 ~]$ samtools index

[user@cn3144 ~]$ --BAM \
      --graph PRG_MHC_GRCh38_withIMGT --sampleID NA12878 \
	    --maxThreads 7 --workingDir .
[+] Loading HLA-PRG-LA f0833ed on cn3144
Using 7 CPUS

Identified paths:
    samtools_bin: /usr/local/apps/samtools/1.3/bin/samtools
    bwa_bin: /usr/local/apps/bwa/0.7.12/bwa
    java_bin: /usr/bin/java
    picard_sam2fastq_bin: /usr/local/apps/picard/1.119/SamToFastq.jar
    General working directory: /lscratch/43090316
    Sample-specific working directory: /lscratch/43090316/NA12878

Extract reads from 534 regions...
Extract unmapped reads...
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:

set -e
cd /lscratch/$SLURM_JOBID
module load HLA-PRG-LA

cp /data/$USER/myfile.cram .
samtools index myfile.cram

cpus=$(( SLURM_CPUS_PER_TASK - 1 ))
echo "Running on $cpus CPUs" --BAM myfile.cram --graph PRG_MHC_GRCh38_withIMGT --sampleID myfile --maxThreads $cpus --workingDir .

# copy output from /lscratch back to /data area
cp -r myfile/hla  /data/$USER/

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=32 --mem=100g --gres=lscratch:100 --time=1-00:00:00