DeepLoc2 on Biowulf

DeepLoc2 uses deep learning to predict subcellular localization of eukaryotic proteins.

DeepLoc 2.0 predicts the subcellular localization(s) of eukaryotic proteins. DeepLoc 2.0 is a multi-label predictor, which means that is able to predict one or more localizations for any given protein. It can differentiate between 10 different localizations: Nucleus, Cytoplasm, Extracellular, Mitochondrion, Cell membrane, Endoplasmic reticulum, Chloroplast, Golgi apparatus, Lysosome/Vacuole and Peroxisome. Additionally, DeepLoc 2.0 can predict the presence of the sorting signal(s) that had an influence on the prediction of the subcellular localization(s).

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program on the test data. Then compare it with the test data result using diff.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --mem=24G --gres=lscratch:5
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load deeploc

[user@cn3144 ~]$ mkdir -p /data/$USER/.cache/torch/hub

[user@cn3144 ~]$ cp -r $DEEPLOC_TRAIN_DATA/checkpoints /data/$USER/.cache/torch/hub/

[user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID

[user@cn3144 ~]$ cp $DEEPLOC_TEST_DATA/test.fasta .

[user@cn3144 ~]$ deeploc2 -f test.fasta

[user@cn3144 ~]$ diff outputs/results_20230101-000000.csv $DEEPLOC_TEST_DATA/results_test.csv

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. deeploc.sh). For example:

#!/bin/bash
set -e
module load deeploc
cd /data/$USER
deeploc2 -f input.fasta

Submit this job using the Slurm sbatch command.

sbatch [--mem=#] deeploc.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. deeploc.swarm). For example:

deeploc2 -f 01.fasta -o results_01
deeploc2 -f 02.fasta -o results_02
deeploc2 -f 03.fasta -o results_03
deeploc2 -f 04.fasta -o results_04

Submit this job using the swarm command.

swarm -f deeploc.swarm [-g #] --module deeploc
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
--module deeploc Loads the deeploc module for each subjob in the swarm