AlphaLink predicts protein structures using deep learning given a sequence and a set of experimental contacts. It extends OpenFold with crosslinking MS data or other experimental distance restraint by explicitly incorporating them in the OpenFold architecture.
alphalink
(see the modules page
for more information)$ALPHALINK_TEST_DATA
ALPHALINK_CP_[CACA|DIST]
- AlphaLink model checkpoint files[UNIREF90|MGNIFY|PDB70|UNICLUST30]_PATH
- Path to Alphafold DBs[JACKHMMER|HHBLITS|HHSEARCH|KALIGN]_BIN
- Required programsIn the following example we will predict a structure from a single FASTA sequence and the crosslinking mass-spectrometry residue pairs.
[user@biowulf]$ sinteractive --gres=gpu:1,lscratch:100 --constraint='gpup100|gpuv100|gpuv100x|gpua100' -c 8 --mem=32g [user@cn3144]$ module load alphalink/1.0 [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ cp -r ${ALPHALINK_TEST_DATA:-none}/* . [user@cn3144]$ time predict_with_crosslinks.py CDK/fasta/CDK.fasta CDK/crosslinks/mixture.csv \ --checkpoint_path $ALPHALINK_CP_CACA \ --uniref90_database_path $UNIREF90_PATH \ --mgnify_database_path $MGNIFY_PATH \ --pdb70_database_path $PDB70_PATH \ --uniclust30_database_path $UNICLUST30_PATH \ --jackhmmer_binary_path $JACKHMMER_BIN \ --hhblits_binary_path $HHBLITS_BIN \ --hhsearch_binary_path $HHSEARCH_BIN \ --kalign_binary_path $KALIGN_BIN INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Loaded OpenFold parameters at /fdb/alphalink/finetuning_model_5_ptm_CACA_10A.pt... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Generating alignments for sp|P24941|CDK2_HUMAN... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Loaded 9 restraints... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Running inference for sp|P24941|CDK2_HUMAN... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Inference time: 56.18929745070636 INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_unrelaxed.pdb... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Relaxed output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_relaxed.pdb... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Model output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_output_dict.pkl... real 44m12.596s user 159m13.407s sys 4m3.506s [user@cn3144]$ ls -lh predictions total 153M -rw-r--r-- 1 user group 153M Oct 17 12:26 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_output_dict.pkl' -rw-r--r-- 1 user group 384K Oct 17 12:26 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_relaxed.pdb' -rw-r--r-- 1 user group 190K Oct 17 12:25 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_unrelaxed.pdb'
#! /bin/bash set -e module load alphalink predict_with_crosslinks.py protein.fasta crosslinks.csv \ --checkpoint_path $ALPHALINK_CP_CACA \ --uniref90_database_path $UNIREF90_PATH \ --mgnify_database_path $MGNIFY_PATH \ --pdb70_database_path $PDB70_PATH \ --uniclust30_database_path $UNICLUST30_PATH \ --jackhmmer_binary_path $JACKHMMER_BIN \ --hhblits_binary_path $HHBLITS_BIN \ --hhsearch_binary_path $HHSEARCH_BIN \ --kalign_binary_path $KALIGN_BIN
Submit this job using the Slurm sbatch command.
[user@biowulf]$ sbatch --cpus-per-task=8 --mem=32g --time=3:00:00 \ --gres=lscratch:50,gpu:1 \ --partition=gpu \ --constraint='gpup100|gpuv100|gpuv100x|gpua100' \ alphalink.sh