GPU-Accelerated Python STOPgap for Template Matching (GAPSTOP) is an open-source python based software for fast template matching (TM) in cryo electron tomograms that is based on TM routines from STOPGAP. GAPSTOP is able to leverage the power of GPU accelerated HPC systems to be efficiently used for template matching. It speeds up template matching by using an MPI-parallel layout and offloading the compute-heavy correlation kernel to one or more accelerator devices per MPI-process using JAX.
Allocate an interactive session and run the program. 
Sample session (user input in bold):
[user@biowulf]$ sinteractive --gres=gpu:p100:1 --mem=8g -c8 --no-res-shell salloc.exe: Pending job allocation 44219178 salloc.exe: job 44219178 queued and waiting for resources salloc.exe: job 44219178 has been allocated resources salloc.exe: Granted job allocation 44219178 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn2364 are ready for job [user@cn2364 ~]$ module load gapstop [user@cn2364 ~]$ gapstop --help [user@cn2364 ~]$ wget https://oc.biophys.mpg.de/owncloud/s/Wi6xyXCFTXckg8M/download/tm_tutorial.zip [user@cn2364 ~]$ unzip tm_tutorial.zip [user@cn2364 ~]$ cd inputs [user@cn2364 ~]$ srun --mpi=pmix_v3 gapstop run_tm -n 1 tm_param.star
gapstop run_tm -n is the "number of tiles to decompose tomogram" and best to be the same as the number of tasks to run the job efficiently. 
Sample session (user input in bold):
[user@biowulf]$ cat << 'EOS' > gapstop.sh > #!/bin/bash > ml gapstop > ulimit -l unlimited > gapstop run_tm -n 2 tm_param.star > EOS [user@biowulf]$ sbatch --partition=gpu --gres=gpu:p100:2 -c8 --mem=16g gapstop.sh