AlphaLink predicts protein structures using deep learning given a sequence and a set of experimental contacts. It extends OpenFold with crosslinking MS data or other experimental distance restraint by explicitly incorporating them in the OpenFold architecture.
alphalink (see the modules page
for more information)$ALPHALINK_TEST_DATAALPHALINK_CP_[CACA|DIST] - AlphaLink model checkpoint files[UNIREF90|MGNIFY|PDB70|UNICLUST30]_PATH - Path to Alphafold DBs[JACKHMMER|HHBLITS|HHSEARCH|KALIGN]_BIN - Required programsIn the following example we will predict a structure from a single FASTA sequence and the crosslinking mass-spectrometry residue pairs.
[user@biowulf]$ sinteractive --gres=gpu:1,lscratch:100 --constraint='gpup100|gpuv100|gpuv100x|gpua100' -c 8 --mem=32g
[user@cn3144]$ module load alphalink/1.0
[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ cp -r ${ALPHALINK_TEST_DATA:-none}/* .
[user@cn3144]$ time predict_with_crosslinks.py CDK/fasta/CDK.fasta CDK/crosslinks/mixture.csv \
--checkpoint_path $ALPHALINK_CP_CACA \
--uniref90_database_path $UNIREF90_PATH \
--mgnify_database_path $MGNIFY_PATH \
--pdb70_database_path $PDB70_PATH \
--uniclust30_database_path $UNICLUST30_PATH \
--jackhmmer_binary_path $JACKHMMER_BIN \
--hhblits_binary_path $HHBLITS_BIN \
--hhsearch_binary_path $HHSEARCH_BIN \
--kalign_binary_path $KALIGN_BIN
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Loaded OpenFold parameters at /fdb/alphalink/finetuning_model_5_ptm_CACA_10A.pt...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Generating alignments for sp|P24941|CDK2_HUMAN...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Loaded 9 restraints...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Running inference for sp|P24941|CDK2_HUMAN...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Inference time: 56.18929745070636
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_unrelaxed.pdb...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Relaxed output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_relaxed.pdb...
INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Model output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_output_dict.pkl...
real 44m12.596s
user 159m13.407s
sys 4m3.506s
[user@cn3144]$ ls -lh predictions
total 153M
-rw-r--r-- 1 user group 153M Oct 17 12:26 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_output_dict.pkl'
-rw-r--r-- 1 user group 384K Oct 17 12:26 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_relaxed.pdb'
-rw-r--r-- 1 user group 190K Oct 17 12:25 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_unrelaxed.pdb'
#! /bin/bash
set -e
module load alphalink
predict_with_crosslinks.py protein.fasta crosslinks.csv \
--checkpoint_path $ALPHALINK_CP_CACA \
--uniref90_database_path $UNIREF90_PATH \
--mgnify_database_path $MGNIFY_PATH \
--pdb70_database_path $PDB70_PATH \
--uniclust30_database_path $UNICLUST30_PATH \
--jackhmmer_binary_path $JACKHMMER_BIN \
--hhblits_binary_path $HHBLITS_BIN \
--hhsearch_binary_path $HHSEARCH_BIN \
--kalign_binary_path $KALIGN_BIN
Submit this job using the Slurm sbatch command.
[user@biowulf]$ sbatch --cpus-per-task=8 --mem=32g --time=3:00:00 \
--gres=lscratch:50,gpu:1 \
--partition=gpu \
--constraint='gpup100|gpuv100|gpuv100x|gpua100' \
alphalink.sh