OptiType is a novel HLA genotyping algorithm based on integer linear programming, capable of producing accurate 4-digit HLA genotyping predictions from NGS data by simultaneously selecting all major and minor HLA Class I alleles.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task 4 --gres lscratch:10
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load optitype
[+] Loading glpk 4.65
[+] Loading HDF5 1.10.1
[+] Loading samtools 1.9 ...
[+] Loading optitype, version 1.3.2...
[user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144 46116226]$ cp -a $OPTITYPE_HOME/test .
[user@cn3144 46116226]$ cp $OPTITYPE_HOME/config.ini . # copy the default configuration and edit it to set the number of threads you want to use.
[user@cn3144 46116226]$ OptiTypePipeline.py -i ./test/rna/CRC_81_N_1_fished.fastq ./test/rna/CRC_81_N_2_fished.fastq --rna -v -o ./test/rna/ -c config.ini
mapping with 4 threads...
0:00:03.77 Mapping CRC_81_N_1_fished.fastq to NUC reference...
0:00:08.84 Mapping CRC_81_N_2_fished.fastq to NUC reference...
0:00:14.42 Generating binary hit matrix.
0:00:14.45 Loading ./test/rna/2019_02_01_16_43_08/2019_02_01_16_43_08_1.bam started. Number of HLA reads loaded (updated every thousand):
0:00:14.60 301 reads loaded. Creating dataframe...
0:00:14.62 Dataframes created. Shape: 301 x 7339, hits: 67072 (111698), sparsity: 1 in 19.78
0:00:14.70 Loading ./test/rna/2019_02_01_16_43_08/2019_02_01_16_43_08_2.bam started. Number of HLA reads loaded (updated every thousand):
0:00:14.89 291 reads loaded. Creating dataframe...
0:00:14.91 Dataframes created. Shape: 291 x 7339, hits: 59422 (104782), sparsity: 1 in 20.38
0:00:14.95 Alignment pairing completed. 152 paired, 268 unpaired, 10 discordant
0:00:17.07 temporary pruning of identical rows and columns
0:00:17.11 Size of mtx with unique rows and columns: (124, 250)
0:00:17.11 determining minimal set of non-overshadowed alleles
0:00:17.58 Keeping only the minimal number of required alleles (14,)
0:00:17.58 Creating compact model...
starting ilp solver with 1 threads...
0:00:17.60 Initializing OptiType model...
GLPSOL: GLPK LP/MIP Solver, v4.65
Parameter(s) specified in the command line:
--write /tmp/tmpHIKpQd.glpk.raw --wglp /tmp/tmpM6Df7g.glpk.glp --cpxlp /tmp/tmpkcqGUm.pyomo.lp
Reading problem data from '/tmp/tmpkcqGUm.pyomo.lp'...
/tmp/tmpkcqGUm.pyomo.lp:715: warning: lower bound of variable 'x1' redefined
/tmp/tmpkcqGUm.pyomo.lp:715: warning: upper bound of variable 'x1' redefined
104 rows, 64 columns, 283 non-zeros
38 integer variables, all of which are binary
753 lines were read
Writing problem data to '/tmp/tmpM6Df7g.glpk.glp'...
632 lines were written
GLPK Integer Optimizer, v4.65
104 rows, 64 columns, 283 non-zeros
38 integer variables, all of which are binary
Preprocessing...
14 hidden covering inequaliti(es) were detected
103 rows, 63 columns, 282 non-zeros
38 integer variables, all of which are binary
Scaling...
A: min|aij| = 1.000e+00 max|aij| = 3.000e+00 ratio = 3.000e+00
Problem data seem to be well scaled
Constructing initial basis...
Size of triangular part is 103
Solving LP relaxation...
GLPK Simplex Optimizer, v4.65
103 rows, 63 columns, 282 non-zeros
0: obj = -0.000000000e+00 inf = 3.000e+00 (3)
3: obj = -0.000000000e+00 inf = 0.000e+00 (0)
* 65: obj = 1.284360000e+02 inf = 6.291e-16 (0)
OPTIMAL LP SOLUTION FOUND
Integer optimization begins...
Long-step dual simplex will be used
+ 65: mip = not found yet <= +inf (1; 0)
+ 65: >>>>> 1.284360000e+02 <= 1.284360000e+02 0.0% (1; 0)
+ 65: mip = 1.284360000e+02 <= tree is empty 0.0% (0; 1)
INTEGER OPTIMAL SOLUTION FOUND
Time used: 0.0 secs
Memory used: 0.2 Mb (193401 bytes)
Writing MIP solution to '/tmp/tmpHIKpQd.glpk.raw'...
177 lines were written
0:00:17.78 Result dataframe has been constructed...
[user@cn3144 46116226]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Create a batch input file (e.g. optitype.sh). For example:
#!/bin/bash set -e module load optitype OptiTypePipeline.py -i ./test/rna/CRC_81_N_1_fished.fastq ./test/rna/CRC_81_N_2_fished.fastq --rna -v -o ./test/rna/ -c config.ini
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] optitype.sh
Create a swarmfile (e.g. optitype.swarm). For example:
python OptiTypePipeline.py -i ./test/exome/NA11995_SRR766010_1_fished.fastq ./test/exome/NA11995_SRR766010_2_fished.fastq --dna -v -o ./test/exome/ -c config.ini OptiTypePipeline.py -i ./test/rna/CRC_81_N_1_fished.fastq ./test/rna/CRC_81_N_2_fished.fastq --rna -v -o ./test/rna/ -c config.ini
Submit this job using the swarm command.
swarm -f optitype.swarm [-g #] [-t #] --module optitypewhere
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| -t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
| --module optitype | Loads the OptiType module for each subjob in the swarm |