seq2HLA computationally determines human leukocyte antigen (HLA) genotypes of a sample. It takes standard RNA-Seq sequence reads in fastq format as input, uses a bowtie index comprising all HLA alleles and outputs the most likely HLA class I and class II genotypes (in 4 digit resolution), a p-value for each call, and the expression of each class.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task 8 --mem 32g --gres lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load seq2hla [user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144 46116226]$ cp $SEQ2HLA_HOME/test/*.fastq.gz . [user@cn3144 46116226]$ seq2HLA -1 SRR4300096_1.fastq.gz -2 SRR4300096_2.fastq.gz -r test -p $SLURM_CPUS_PER_TASK Now running seq2HLA version 2.3! Input is a gipped file ..... The read length of your input fastq was determined to be 100, so 2 mismatches will be allowed and 8 threads will be used by bowtie. ----------HLA class I------------ >---classical HLA alleles--- First iteration starts.... Mapping ...... # reads processed: 34632623 # reads with at least one reported alignment: 10689 (0.03%) # reads that failed to align: 34621934 (99.97%) Reported 853322 paired-end alignments to 1 output stream(s) Calculation of first digital haplotype..... The digital haplotype is written into test-ClassI-class.digitalhaplotype1 1st iteration done. Now removing reads that mapped to the three top-scoring groups ....... Second iterations starts ..... Mapping ...... # reads processed: 4332 # reads with at least one reported alignment: 4332 (100.00%) # reads that failed to align: 0 (0.00%) Reported 363543 paired-end alignments to 1 output stream(s) Calculation of second digital haplotype..... The digital haplotype is written into test-ClassI-class.digitalhaplotype2 2nd iteration done. -----------2 digit typing results------------- #Locus Allele 1 Confidence Allele 2 Confidence A A*25 0.0006404339 A*01 0.01656387 B B*08 0.0008702224 B*18 1.98204e-05 C C*12 7.154939e-05 C*07 0.006626664 Calculation of locus-specific expression ... test-ClassI-class.bowtielog A: 76.22 RPKM C: 91.62 RPKM B: 96.65 RPKM -----------4 digit typing results------------- #Locus Allele 1 Confidence Allele 2 Confidence A A*25:01 0.0006404339 A*01:01 0.0179885 B B*08:01 0.0008702224 B*18:01 1.98204e-05 C C*12:03 7.51191e-05 C*07:01' 0.006944861 ----------HLA class I------------ >---nonclassical HLA alleles--- First iteration starts.... Mapping ...... # reads processed: 34632623 # reads with at least one reported alignment: 4465 (0.01%) # reads that failed to align: 34628158 (99.99%) Reported 35369 paired-end alignments to 1 output stream(s) Calculation of first digital haplotype..... The digital haplotype is written into test-ClassI-nonclass.digitalhaplotype1 1st iteration done. Now removing reads that mapped to the three top-scoring groups ....... Second iterations starts ..... Mapping ...... Warning: Could not find any reads in "test-ClassI-nonclass-2nditeration_1.fq" Warning: Could not find any reads in "test-ClassI-nonclass-2nditeration_2.fq" # reads processed: 0 # reads with at least one reported alignment: 0 (0.00%) # reads that failed to align: 0 (0.00%) No alignments Calculation of second digital haplotype..... The digital haplotype is written into test-ClassI-nonclass.digitalhaplotype2 2nd iteration done. -----------2 digit typing results------------- #Locus Allele 1 Confidence Allele 2 Confidence E E*01 NA hoz("E*01") NA F F*01 NA hoz("F*01") NA G no NA hoz("G*01") NA H H*02 0 hoz("H*02") NA J J*01 NA hoz("J*01") NA K K*01 NA hoz("K*01") NA L L*01 NA hoz("L*01") NA P no NA hoz("P*02") NA V no NA hoz("V*01") NA Calculation of locus-specific expression ... test-ClassI-nonclass.bowtielog E: 115.75 RPKM G: 0.0 RPKM F: 18.44 RPKM H: 25.7 RPKM K: 0.16 RPKM J: 0.03 RPKM L: 0.11 RPKM P: 0.0 RPKM V: 0.0 RPKM # reads processed: 4465 # reads with at least one reported alignment: 4465 (100.00%) # reads that failed to align: 0 (0.00%) Reported 35369 paired-end alignments to 1 output stream(s) The digital haplotype is written into test-ClassI-nonclass.digitalhaplotype3 -----------4 digit typing results------------- #Locus Allele 1 Confidence Allele 2 Confidence E E*01:01 NA E*01:01 NA F F*01:01' NA F*01:01 NA G no NA no NA H H*02:06 0.0 H*02:06 NA J J*01:01 NA J*01:01 NA K K*01:01 NA K*01:01 NA L L*01:01 NA L*01:01 NA P no NA no NA V no NA no NA ----------HLA class II------------ First iteration starts.... Mapping ...... # reads processed: 34632623 # reads with at least one reported alignment: 1889 (0.01%) # reads that failed to align: 34630734 (99.99%) Reported 57508 paired-end alignments to 1 output stream(s) Calculation of first digital haplotype..... The digital haplotype is written into test-ClassII.digitalhaplotype1 1st iteration done. Now removing reads that mapped to the three top-scoring groups ....... Second iterations starts ..... Mapping ...... # reads processed: 395 # reads with at least one reported alignment: 395 (100.00%) # reads that failed to align: 0 (0.00%) Reported 19773 paired-end alignments to 1 output stream(s) Calculation of second digital haplotype..... The digital haplotype is written into test-ClassII.digitalhaplotype2 2nd iteration done. -----------2 digit typing results------------- #Locus Allele 1 Confidence Allele 2 Confidence DQA1 DQA1*03 1.534371e-05 DQA1*05 0.2468488 DQB1 DQB1*02 0.2004472 DQB1*03 0.3331947 DRB1 DRB1*04 0.0006670405 DRB1*03 0.04627267 DRA DRA*01 NA hoz("DRA*01") NA DPA1 DPA1*01 0.3854021 hoz("DPA1*02") 0.0574801 DPB1 DPB1*105 1.045282e-07 DPB1*04 0.1606171 Calculation of locus-specific expression ... test-ClassII.bowtielog DQB1: 2.54 RPKM DQA1: 6.5 RPKM DRB1: 26.2 RPKM DPB1: 12.76 RPKM DRA: 40.35 RPKM DPA1: 17.21 RPKM # reads processed: 1889 # reads with at least one reported alignment: 1889 (100.00%) # reads that failed to align: 0 (0.00%) Reported 57508 paired-end alignments to 1 output stream(s) The digital haplotype is written into test-ClassII.digitalhaplotype3 -----------4 digit typing results------------- #Locus Allele 1 Confidence Allele 2 Confidence DQA1 DQA1*03:02' 1.534371e-05 DQA1*05:01 0.2468488 DQB1 DQB1*02:01' 0.2004472 DQB1*03:03' 0.3331947 DRB1 DRB1*04:01' 0.0006670405 DRB1*03:01 0.04627267 DRA DRA*01:01 NA DRA*01:01 NA DPA1 DPA1*01:03 0.3854021 DPA1*01:03 0.0574801 DPB1 DPB1*105:01 1.045282e-07 DPB1*04:01 0.1606171 [user@cn3144 46116226]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. seq2hla.sh). For example:
#!/bin/bash set -e module load seq2hla seq2HLA -1 SRR4300096_1.fastq.gz -2 SRR4300096_2.fastq.gz -r test -p $SLURM_CPUS_PER_TASK
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=# [--mem=#] seq2hla.sh
Create a swarmfile (e.g. seq2hla.swarm). For example:
seq2HLA -1 sample1_1.fastq.gz -2 sample1_2.fastq.gz -r sample1 -p $SLURM_CPUS_PER_TASK seq2HLA -1 sample2_1.fastq.gz -2 sample2_2.fastq.gz -r sample2 -p $SLURM_CPUS_PER_TASK seq2HLA -1 sample3_1.fastq.gz -2 sample3_2.fastq.gz -r sample3 -p $SLURM_CPUS_PER_TASK seq2HLA -1 sample4_1.fastq.gz -2 sample4_2.fastq.gz -r sample4 -p $SLURM_CPUS_PER_TASK
Submit this job using the swarm command.
swarm -f seq2hla.swarm [-g #] -t # --module seq2hlawhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module seq2hla | Loads the seq2HLA module for each subjob in the swarm |