seq2HLA computationally determines human leukocyte antigen (HLA) genotypes of a sample. It takes standard RNA-Seq sequence reads in fastq format as input, uses a bowtie index comprising all HLA alleles and outputs the most likely HLA class I and class II genotypes (in 4 digit resolution), a p-value for each call, and the expression of each class.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task 8 --mem 32g --gres lscratch:10
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load seq2hla
[user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144 46116226]$ cp $SEQ2HLA_HOME/test/*.fastq.gz .
[user@cn3144 46116226]$ seq2HLA -1 SRR4300096_1.fastq.gz -2 SRR4300096_2.fastq.gz -r test -p $SLURM_CPUS_PER_TASK
Now running seq2HLA version 2.3!
Input is a gipped file .....
The read length of your input fastq was determined to be 100, so 2 mismatches will be allowed and 8 threads will be used by bowtie.
----------HLA class I------------
>---classical HLA alleles---
First iteration starts....
Mapping ......
# reads processed: 34632623
# reads with at least one reported alignment: 10689 (0.03%)
# reads that failed to align: 34621934 (99.97%)
Reported 853322 paired-end alignments to 1 output stream(s)
Calculation of first digital haplotype.....
The digital haplotype is written into test-ClassI-class.digitalhaplotype1
1st iteration done.
Now removing reads that mapped to the three top-scoring groups .......
Second iterations starts .....
Mapping ......
# reads processed: 4332
# reads with at least one reported alignment: 4332 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 363543 paired-end alignments to 1 output stream(s)
Calculation of second digital haplotype.....
The digital haplotype is written into test-ClassI-class.digitalhaplotype2
2nd iteration done.
-----------2 digit typing results-------------
#Locus Allele 1 Confidence Allele 2 Confidence
A A*25 0.0006404339 A*01 0.01656387
B B*08 0.0008702224 B*18 1.98204e-05
C C*12 7.154939e-05 C*07 0.006626664
Calculation of locus-specific expression ...
test-ClassI-class.bowtielog
A: 76.22 RPKM
C: 91.62 RPKM
B: 96.65 RPKM
-----------4 digit typing results-------------
#Locus Allele 1 Confidence Allele 2 Confidence
A A*25:01 0.0006404339 A*01:01 0.0179885
B B*08:01 0.0008702224 B*18:01 1.98204e-05
C C*12:03 7.51191e-05 C*07:01' 0.006944861
----------HLA class I------------
>---nonclassical HLA alleles---
First iteration starts....
Mapping ......
# reads processed: 34632623
# reads with at least one reported alignment: 4465 (0.01%)
# reads that failed to align: 34628158 (99.99%)
Reported 35369 paired-end alignments to 1 output stream(s)
Calculation of first digital haplotype.....
The digital haplotype is written into test-ClassI-nonclass.digitalhaplotype1
1st iteration done.
Now removing reads that mapped to the three top-scoring groups .......
Second iterations starts .....
Mapping ......
Warning: Could not find any reads in "test-ClassI-nonclass-2nditeration_1.fq"
Warning: Could not find any reads in "test-ClassI-nonclass-2nditeration_2.fq"
# reads processed: 0
# reads with at least one reported alignment: 0 (0.00%)
# reads that failed to align: 0 (0.00%)
No alignments
Calculation of second digital haplotype.....
The digital haplotype is written into test-ClassI-nonclass.digitalhaplotype2
2nd iteration done.
-----------2 digit typing results-------------
#Locus Allele 1 Confidence Allele 2 Confidence
E E*01 NA hoz("E*01") NA
F F*01 NA hoz("F*01") NA
G no NA hoz("G*01") NA
H H*02 0 hoz("H*02") NA
J J*01 NA hoz("J*01") NA
K K*01 NA hoz("K*01") NA
L L*01 NA hoz("L*01") NA
P no NA hoz("P*02") NA
V no NA hoz("V*01") NA
Calculation of locus-specific expression ...
test-ClassI-nonclass.bowtielog
E: 115.75 RPKM
G: 0.0 RPKM
F: 18.44 RPKM
H: 25.7 RPKM
K: 0.16 RPKM
J: 0.03 RPKM
L: 0.11 RPKM
P: 0.0 RPKM
V: 0.0 RPKM
# reads processed: 4465
# reads with at least one reported alignment: 4465 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 35369 paired-end alignments to 1 output stream(s)
The digital haplotype is written into test-ClassI-nonclass.digitalhaplotype3
-----------4 digit typing results-------------
#Locus Allele 1 Confidence Allele 2 Confidence
E E*01:01 NA E*01:01 NA
F F*01:01' NA F*01:01 NA
G no NA no NA
H H*02:06 0.0 H*02:06 NA
J J*01:01 NA J*01:01 NA
K K*01:01 NA K*01:01 NA
L L*01:01 NA L*01:01 NA
P no NA no NA
V no NA no NA
----------HLA class II------------
First iteration starts....
Mapping ......
# reads processed: 34632623
# reads with at least one reported alignment: 1889 (0.01%)
# reads that failed to align: 34630734 (99.99%)
Reported 57508 paired-end alignments to 1 output stream(s)
Calculation of first digital haplotype.....
The digital haplotype is written into test-ClassII.digitalhaplotype1
1st iteration done.
Now removing reads that mapped to the three top-scoring groups .......
Second iterations starts .....
Mapping ......
# reads processed: 395
# reads with at least one reported alignment: 395 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 19773 paired-end alignments to 1 output stream(s)
Calculation of second digital haplotype.....
The digital haplotype is written into test-ClassII.digitalhaplotype2
2nd iteration done.
-----------2 digit typing results-------------
#Locus Allele 1 Confidence Allele 2 Confidence
DQA1 DQA1*03 1.534371e-05 DQA1*05 0.2468488
DQB1 DQB1*02 0.2004472 DQB1*03 0.3331947
DRB1 DRB1*04 0.0006670405 DRB1*03 0.04627267
DRA DRA*01 NA hoz("DRA*01") NA
DPA1 DPA1*01 0.3854021 hoz("DPA1*02") 0.0574801
DPB1 DPB1*105 1.045282e-07 DPB1*04 0.1606171
Calculation of locus-specific expression ...
test-ClassII.bowtielog
DQB1: 2.54 RPKM
DQA1: 6.5 RPKM
DRB1: 26.2 RPKM
DPB1: 12.76 RPKM
DRA: 40.35 RPKM
DPA1: 17.21 RPKM
# reads processed: 1889
# reads with at least one reported alignment: 1889 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 57508 paired-end alignments to 1 output stream(s)
The digital haplotype is written into test-ClassII.digitalhaplotype3
-----------4 digit typing results-------------
#Locus Allele 1 Confidence Allele 2 Confidence
DQA1 DQA1*03:02' 1.534371e-05 DQA1*05:01 0.2468488
DQB1 DQB1*02:01' 0.2004472 DQB1*03:03' 0.3331947
DRB1 DRB1*04:01' 0.0006670405 DRB1*03:01 0.04627267
DRA DRA*01:01 NA DRA*01:01 NA
DPA1 DPA1*01:03 0.3854021 DPA1*01:03 0.0574801
DPB1 DPB1*105:01 1.045282e-07 DPB1*04:01 0.1606171
[user@cn3144 46116226]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Create a batch input file (e.g. seq2hla.sh). For example:
#!/bin/bash set -e module load seq2hla seq2HLA -1 SRR4300096_1.fastq.gz -2 SRR4300096_2.fastq.gz -r test -p $SLURM_CPUS_PER_TASK
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=# [--mem=#] seq2hla.sh
Create a swarmfile (e.g. seq2hla.swarm). For example:
seq2HLA -1 sample1_1.fastq.gz -2 sample1_2.fastq.gz -r sample1 -p $SLURM_CPUS_PER_TASK seq2HLA -1 sample2_1.fastq.gz -2 sample2_2.fastq.gz -r sample2 -p $SLURM_CPUS_PER_TASK seq2HLA -1 sample3_1.fastq.gz -2 sample3_2.fastq.gz -r sample3 -p $SLURM_CPUS_PER_TASK seq2HLA -1 sample4_1.fastq.gz -2 sample4_2.fastq.gz -r sample4 -p $SLURM_CPUS_PER_TASK
Submit this job using the swarm command.
swarm -f seq2hla.swarm [-g #] -t # --module seq2hlawhere
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| -t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
| --module seq2hla | Loads the seq2HLA module for each subjob in the swarm |