AlphaLink on Biowulf

AlphaLink predicts protein structures using deep learning given a sequence and a set of experimental contacts. It extends OpenFold with crosslinking MS data or other experimental distance restraint by explicitly incorporating them in the OpenFold architecture.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

In the following example we will predict a structure from a single FASTA sequence and the crosslinking mass-spectrometry residue pairs.

AlphaLink structure prediction

AlphaLink with CSV crosslink restraints

[user@biowulf]$ sinteractive --gres=gpu:1,lscratch:100 --constraint='gpup100|gpuv100|gpuv100x|gpua100' -c 8 --mem=32g
[user@cn3144]$ module load alphalink/1.0
[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ cp -r ${ALPHALINK_TEST_DATA:-none}/* .
[user@cn3144]$ time predict_with_crosslinks.py CDK/fasta/CDK.fasta CDK/crosslinks/mixture.csv \
                    --checkpoint_path $ALPHALINK_CP_CACA \
                    --uniref90_database_path $UNIREF90_PATH \
                    --mgnify_database_path $MGNIFY_PATH \
                    --pdb70_database_path $PDB70_PATH \
                    --uniclust30_database_path $UNICLUST30_PATH \
                    --jackhmmer_binary_path $JACKHMMER_BIN \
                    --hhblits_binary_path $HHBLITS_BIN \
                    --hhsearch_binary_path $HHSEARCH_BIN \
                    --kalign_binary_path $KALIGN_BIN
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Loaded OpenFold parameters at /fdb/alphalink/finetuning_model_5_ptm_CACA_10A.pt...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Generating alignments for sp|P24941|CDK2_HUMAN...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Loaded 9 restraints...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Running inference for sp|P24941|CDK2_HUMAN...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Inference time: 56.18929745070636
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_unrelaxed.pdb...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Relaxed output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_relaxed.pdb...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Model output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_output_dict.pkl...
real    44m12.596s
user    159m13.407s
sys     4m3.506s
[user@cn3144]$ ls -lh predictions
total 153M
-rw-r--r-- 1 user group 153M Oct 17 12:26 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_output_dict.pkl'
-rw-r--r-- 1 user group 384K Oct 17 12:26 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_relaxed.pdb'
-rw-r--r-- 1 user group 190K Oct 17 12:25 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_unrelaxed.pdb'

Batch job
Most jobs should be run as batch jobs.

Prediction of structure with crosslinking constraints - at most 10 concurrent jobs

#! /bin/bash

set -e

module load alphalink

predict_with_crosslinks.py protein.fasta crosslinks.csv \
                    --checkpoint_path $ALPHALINK_CP_CACA \
                    --uniref90_database_path $UNIREF90_PATH \
                    --mgnify_database_path $MGNIFY_PATH \
                    --pdb70_database_path $PDB70_PATH \
                    --uniclust30_database_path $UNICLUST30_PATH \
                    --jackhmmer_binary_path $JACKHMMER_BIN \
                    --hhblits_binary_path $HHBLITS_BIN \
                    --hhsearch_binary_path $HHSEARCH_BIN \
                    --kalign_binary_path $KALIGN_BIN

Submit this job using the Slurm sbatch command.

[user@biowulf]$ sbatch --cpus-per-task=8 --mem=32g --time=3:00:00 \
                   --gres=lscratch:50,gpu:1 \
                   --partition=gpu \
                   --constraint='gpup100|gpuv100|gpuv100x|gpua100' \
                   alphalink.sh