seq2HLA on Biowulf

seq2HLA computationally determines human leukocyte antigen (HLA) genotypes of a sample. It takes standard RNA-Seq sequence reads in fastq format as input, uses a bowtie index comprising all HLA alleles and outputs the most likely HLA class I and class II genotypes (in 4 digit resolution), a p-value for each call, and the expression of each class.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --cpus-per-task 8 --mem 32g --gres lscratch:10
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load seq2hla
[user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144 46116226]$ cp $SEQ2HLA_HOME/test/*.fastq.gz .
[user@cn3144 46116226]$ seq2HLA -1 SRR4300096_1.fastq.gz -2 SRR4300096_2.fastq.gz -r test -p $SLURM_CPUS_PER_TASK
Now running seq2HLA version 2.3!
Input is a gipped file .....
The read length of your input fastq was determined to be 100, so 2 mismatches will be allowed and 8 threads will be used by bowtie.
----------HLA class I------------
>---classical HLA alleles---
First iteration starts....
Mapping ......
# reads processed: 34632623
# reads with at least one reported alignment: 10689 (0.03%)
# reads that failed to align: 34621934 (99.97%)
Reported 853322 paired-end alignments to 1 output stream(s)

Calculation of first digital haplotype.....
The digital haplotype is written into test-ClassI-class.digitalhaplotype1
1st iteration done.
Now removing reads that mapped to the three top-scoring groups .......
Second iterations starts .....
 Mapping ......
# reads processed: 4332
# reads with at least one reported alignment: 4332 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 363543 paired-end alignments to 1 output stream(s)
Calculation of second digital haplotype.....
The digital haplotype is written into test-ClassI-class.digitalhaplotype2
2nd iteration done.
-----------2 digit typing results-------------
#Locus	Allele 1	Confidence	Allele 2	Confidence
A	A*25	0.0006404339	A*01	0.01656387
B	B*08	0.0008702224	B*18	1.98204e-05
C	C*12	7.154939e-05	C*07	0.006626664
Calculation of locus-specific expression ...
test-ClassI-class.bowtielog
A: 76.22 RPKM
C: 91.62 RPKM
B: 96.65 RPKM
-----------4 digit typing results-------------
#Locus	Allele 1	Confidence	Allele 2	Confidence
A	A*25:01	0.0006404339	A*01:01	0.0179885
B	B*08:01	0.0008702224	B*18:01	1.98204e-05
C	C*12:03	7.51191e-05	C*07:01'	0.006944861
----------HLA class I------------
>---nonclassical HLA alleles---
First iteration starts....
Mapping ......
# reads processed: 34632623
# reads with at least one reported alignment: 4465 (0.01%)
# reads that failed to align: 34628158 (99.99%)
Reported 35369 paired-end alignments to 1 output stream(s)

Calculation of first digital haplotype.....
The digital haplotype is written into test-ClassI-nonclass.digitalhaplotype1
1st iteration done.
Now removing reads that mapped to the three top-scoring groups .......
Second iterations starts .....
 Mapping ......
Warning: Could not find any reads in "test-ClassI-nonclass-2nditeration_1.fq"
Warning: Could not find any reads in "test-ClassI-nonclass-2nditeration_2.fq"
# reads processed: 0
# reads with at least one reported alignment: 0 (0.00%)
# reads that failed to align: 0 (0.00%)
No alignments
Calculation of second digital haplotype.....
The digital haplotype is written into test-ClassI-nonclass.digitalhaplotype2
2nd iteration done.
-----------2 digit typing results-------------
#Locus	Allele 1	Confidence	Allele 2	Confidence
E	E*01	NA	hoz("E*01")	NA
F	F*01	NA	hoz("F*01")	NA
G	no	NA	hoz("G*01")	NA
H	H*02	0	hoz("H*02")	NA
J	J*01	NA	hoz("J*01")	NA
K	K*01	NA	hoz("K*01")	NA
L	L*01	NA	hoz("L*01")	NA
P	no	NA	hoz("P*02")	NA
V	no	NA	hoz("V*01")	NA
Calculation of locus-specific expression ...
test-ClassI-nonclass.bowtielog
E: 115.75 RPKM
G: 0.0 RPKM
F: 18.44 RPKM
H: 25.7 RPKM
K: 0.16 RPKM
J: 0.03 RPKM
L: 0.11 RPKM
P: 0.0 RPKM
V: 0.0 RPKM
# reads processed: 4465
# reads with at least one reported alignment: 4465 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 35369 paired-end alignments to 1 output stream(s)
The digital haplotype is written into test-ClassI-nonclass.digitalhaplotype3
-----------4 digit typing results-------------
#Locus	Allele 1	Confidence	Allele 2	Confidence
E	E*01:01	NA	E*01:01	NA
F	F*01:01'	NA	F*01:01	NA
G	no	NA	no	NA
H	H*02:06	0.0	H*02:06	NA
J	J*01:01	NA	J*01:01	NA
K	K*01:01	NA	K*01:01	NA
L	L*01:01	NA	L*01:01	NA
P	no	NA	no	NA
V	no	NA	no	NA
----------HLA class II------------
First iteration starts....
Mapping ......
# reads processed: 34632623
# reads with at least one reported alignment: 1889 (0.01%)
# reads that failed to align: 34630734 (99.99%)
Reported 57508 paired-end alignments to 1 output stream(s)

Calculation of first digital haplotype.....
The digital haplotype is written into test-ClassII.digitalhaplotype1
1st iteration done.
Now removing reads that mapped to the three top-scoring groups .......
Second iterations starts .....
 Mapping ......
# reads processed: 395
# reads with at least one reported alignment: 395 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 19773 paired-end alignments to 1 output stream(s)
Calculation of second digital haplotype.....
The digital haplotype is written into test-ClassII.digitalhaplotype2
2nd iteration done.
-----------2 digit typing results-------------
#Locus	Allele 1	Confidence	Allele 2	Confidence
DQA1	DQA1*03	1.534371e-05	DQA1*05	0.2468488
DQB1	DQB1*02	0.2004472	DQB1*03	0.3331947
DRB1	DRB1*04	0.0006670405	DRB1*03	0.04627267
DRA	DRA*01	NA	hoz("DRA*01")	NA
DPA1	DPA1*01	0.3854021	hoz("DPA1*02")	0.0574801
DPB1	DPB1*105	1.045282e-07	DPB1*04	0.1606171
Calculation of locus-specific expression ...
test-ClassII.bowtielog
DQB1: 2.54 RPKM
DQA1: 6.5 RPKM
DRB1: 26.2 RPKM
DPB1: 12.76 RPKM
DRA: 40.35 RPKM
DPA1: 17.21 RPKM
# reads processed: 1889
# reads with at least one reported alignment: 1889 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 57508 paired-end alignments to 1 output stream(s)
The digital haplotype is written into test-ClassII.digitalhaplotype3
-----------4 digit typing results-------------
#Locus	Allele 1	Confidence	Allele 2	Confidence
DQA1	DQA1*03:02'	1.534371e-05	DQA1*05:01	0.2468488
DQB1	DQB1*02:01'	0.2004472	DQB1*03:03'	0.3331947
DRB1	DRB1*04:01'	0.0006670405	DRB1*03:01	0.04627267
DRA	DRA*01:01	NA	DRA*01:01	NA
DPA1	DPA1*01:03	0.3854021	DPA1*01:03	0.0574801
DPB1	DPB1*105:01	1.045282e-07	DPB1*04:01	0.1606171

[user@cn3144 46116226]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. seq2hla.sh). For example:

#!/bin/bash
set -e
module load seq2hla
seq2HLA -1 SRR4300096_1.fastq.gz -2 SRR4300096_2.fastq.gz -r test -p $SLURM_CPUS_PER_TASK

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=# [--mem=#] seq2hla.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. seq2hla.swarm). For example:

seq2HLA -1 sample1_1.fastq.gz -2 sample1_2.fastq.gz -r sample1 -p $SLURM_CPUS_PER_TASK
seq2HLA -1 sample2_1.fastq.gz -2 sample2_2.fastq.gz -r sample2 -p $SLURM_CPUS_PER_TASK
seq2HLA -1 sample3_1.fastq.gz -2 sample3_2.fastq.gz -r sample3 -p $SLURM_CPUS_PER_TASK
seq2HLA -1 sample4_1.fastq.gz -2 sample4_2.fastq.gz -r sample4 -p $SLURM_CPUS_PER_TASK

Submit this job using the swarm command.

swarm -f seq2hla.swarm [-g #] -t # --module seq2hla
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module seq2hla Loads the seq2HLA module for each subjob in the swarm