OptiType is a novel HLA genotyping algorithm based on integer linear programming, capable of producing accurate 4-digit HLA genotyping predictions from NGS data by simultaneously selecting all major and minor HLA Class I alleles.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task 4 --gres lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load optitype [+] Loading glpk 4.65 [+] Loading HDF5 1.10.1 [+] Loading samtools 1.9 ... [+] Loading optitype, version 1.3.2... [user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144 46116226]$ cp -a $OPTITYPE_HOME/test . [user@cn3144 46116226]$ cp $OPTITYPE_HOME/config.ini . # copy the default configuration and edit it to set the number of threads you want to use. [user@cn3144 46116226]$ OptiTypePipeline.py -i ./test/rna/CRC_81_N_1_fished.fastq ./test/rna/CRC_81_N_2_fished.fastq --rna -v -o ./test/rna/ -c config.ini mapping with 4 threads... 0:00:03.77 Mapping CRC_81_N_1_fished.fastq to NUC reference... 0:00:08.84 Mapping CRC_81_N_2_fished.fastq to NUC reference... 0:00:14.42 Generating binary hit matrix. 0:00:14.45 Loading ./test/rna/2019_02_01_16_43_08/2019_02_01_16_43_08_1.bam started. Number of HLA reads loaded (updated every thousand): 0:00:14.60 301 reads loaded. Creating dataframe... 0:00:14.62 Dataframes created. Shape: 301 x 7339, hits: 67072 (111698), sparsity: 1 in 19.78 0:00:14.70 Loading ./test/rna/2019_02_01_16_43_08/2019_02_01_16_43_08_2.bam started. Number of HLA reads loaded (updated every thousand): 0:00:14.89 291 reads loaded. Creating dataframe... 0:00:14.91 Dataframes created. Shape: 291 x 7339, hits: 59422 (104782), sparsity: 1 in 20.38 0:00:14.95 Alignment pairing completed. 152 paired, 268 unpaired, 10 discordant 0:00:17.07 temporary pruning of identical rows and columns 0:00:17.11 Size of mtx with unique rows and columns: (124, 250) 0:00:17.11 determining minimal set of non-overshadowed alleles 0:00:17.58 Keeping only the minimal number of required alleles (14,) 0:00:17.58 Creating compact model... starting ilp solver with 1 threads... 0:00:17.60 Initializing OptiType model... GLPSOL: GLPK LP/MIP Solver, v4.65 Parameter(s) specified in the command line: --write /tmp/tmpHIKpQd.glpk.raw --wglp /tmp/tmpM6Df7g.glpk.glp --cpxlp /tmp/tmpkcqGUm.pyomo.lp Reading problem data from '/tmp/tmpkcqGUm.pyomo.lp'... /tmp/tmpkcqGUm.pyomo.lp:715: warning: lower bound of variable 'x1' redefined /tmp/tmpkcqGUm.pyomo.lp:715: warning: upper bound of variable 'x1' redefined 104 rows, 64 columns, 283 non-zeros 38 integer variables, all of which are binary 753 lines were read Writing problem data to '/tmp/tmpM6Df7g.glpk.glp'... 632 lines were written GLPK Integer Optimizer, v4.65 104 rows, 64 columns, 283 non-zeros 38 integer variables, all of which are binary Preprocessing... 14 hidden covering inequaliti(es) were detected 103 rows, 63 columns, 282 non-zeros 38 integer variables, all of which are binary Scaling... A: min|aij| = 1.000e+00 max|aij| = 3.000e+00 ratio = 3.000e+00 Problem data seem to be well scaled Constructing initial basis... Size of triangular part is 103 Solving LP relaxation... GLPK Simplex Optimizer, v4.65 103 rows, 63 columns, 282 non-zeros 0: obj = -0.000000000e+00 inf = 3.000e+00 (3) 3: obj = -0.000000000e+00 inf = 0.000e+00 (0) * 65: obj = 1.284360000e+02 inf = 6.291e-16 (0) OPTIMAL LP SOLUTION FOUND Integer optimization begins... Long-step dual simplex will be used + 65: mip = not found yet <= +inf (1; 0) + 65: >>>>> 1.284360000e+02 <= 1.284360000e+02 0.0% (1; 0) + 65: mip = 1.284360000e+02 <= tree is empty 0.0% (0; 1) INTEGER OPTIMAL SOLUTION FOUND Time used: 0.0 secs Memory used: 0.2 Mb (193401 bytes) Writing MIP solution to '/tmp/tmpHIKpQd.glpk.raw'... 177 lines were written 0:00:17.78 Result dataframe has been constructed... [user@cn3144 46116226]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. optitype.sh). For example:
#!/bin/bash set -e module load optitype OptiTypePipeline.py -i ./test/rna/CRC_81_N_1_fished.fastq ./test/rna/CRC_81_N_2_fished.fastq --rna -v -o ./test/rna/ -c config.ini
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] optitype.sh
Create a swarmfile (e.g. optitype.swarm). For example:
python OptiTypePipeline.py -i ./test/exome/NA11995_SRR766010_1_fished.fastq ./test/exome/NA11995_SRR766010_2_fished.fastq --dna -v -o ./test/exome/ -c config.ini OptiTypePipeline.py -i ./test/rna/CRC_81_N_1_fished.fastq ./test/rna/CRC_81_N_2_fished.fastq --rna -v -o ./test/rna/ -c config.ini
Submit this job using the swarm command.
swarm -f optitype.swarm [-g #] [-t #] --module optitypewhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module optitype | Loads the OptiType module for each subjob in the swarm |