AlphaLink on Biowulf
AlphaLink predicts protein structures using deep learning given a sequence and a set of experimental contacts. It extends OpenFold with crosslinking MS data or other experimental distance restraint by explicitly incorporating them in the OpenFold architecture.
References:
- "Protein structure prediction with in-cell photo-crosslinking mass spectrometry and deep learning", Nat. Biotech. XXX doi:10.1038/s41587-023-01704-z. PubMed | Journal
Documentation
- AlphaLink on GitHub
Important Notes
- Module Name:
alphalink
(see the modules page for more information) - Some steps in an analysis require GPU
- Example files in
$ALPHALINK_TEST_DATA
- Environment variables set:
ALPHALINK_CP_[CACA|DIST]
- AlphaLink model checkpoint files[UNIREF90|MGNIFY|PDB70|UNICLUST30]_PATH
- Path to Alphafold DBs[JACKHMMER|HHBLITS|HHSEARCH|KALIGN]_BIN
- Required programs
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
In the following example we will predict a structure from a single FASTA sequence and the crosslinking mass-spectrometry residue pairs.
AlphaLink structure prediction
AlphaLink with CSV crosslink restraints
[user@biowulf]$ sinteractive --gres=gpu:1,lscratch:100 --constraint='gpup100|gpuv100|gpuv100x|gpua100' -c 8 --mem=32g [user@cn3144]$ module load alphalink/1.0 [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ cp -r ${ALPHALINK_TEST_DATA:-none}/* . [user@cn3144]$ time predict_with_crosslinks.py CDK/fasta/CDK.fasta CDK/crosslinks/mixture.csv \ --checkpoint_path $ALPHALINK_CP_CACA \ --uniref90_database_path $UNIREF90_PATH \ --mgnify_database_path $MGNIFY_PATH \ --pdb70_database_path $PDB70_PATH \ --uniclust30_database_path $UNICLUST30_PATH \ --jackhmmer_binary_path $JACKHMMER_BIN \ --hhblits_binary_path $HHBLITS_BIN \ --hhsearch_binary_path $HHSEARCH_BIN \ --kalign_binary_path $KALIGN_BIN INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Loaded OpenFold parameters at /fdb/alphalink/finetuning_model_5_ptm_CACA_10A.pt... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Generating alignments for sp|P24941|CDK2_HUMAN... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Loaded 9 restraints... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Running inference for sp|P24941|CDK2_HUMAN... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Inference time: 56.18929745070636 INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_unrelaxed.pdb... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Relaxed output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_relaxed.pdb... INFO:/usr/local/apps/alphalink/1.0/AlphaLink/predict_with_crosslinks.py:Model output written to /lscratch/0000000/predictions/sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_output_dict.pkl... real 44m12.596s user 159m13.407s sys 4m3.506s [user@cn3144]$ ls -lh predictions total 153M -rw-r--r-- 1 user group 153M Oct 17 12:26 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_output_dict.pkl' -rw-r--r-- 1 user group 384K Oct 17 12:26 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_relaxed.pdb' -rw-r--r-- 1 user group 190K Oct 17 12:25 'sp|P24941|CDK2_HUMAN_model_5_ptm_crosslinks_unrelaxed.pdb'
Batch job
Most jobs should be run as batch jobs.
Prediction of structure with crosslinking constraints - at most 10 concurrent jobs
#! /bin/bash set -e module load alphalink predict_with_crosslinks.py protein.fasta crosslinks.csv \ --checkpoint_path $ALPHALINK_CP_CACA \ --uniref90_database_path $UNIREF90_PATH \ --mgnify_database_path $MGNIFY_PATH \ --pdb70_database_path $PDB70_PATH \ --uniclust30_database_path $UNICLUST30_PATH \ --jackhmmer_binary_path $JACKHMMER_BIN \ --hhblits_binary_path $HHBLITS_BIN \ --hhsearch_binary_path $HHSEARCH_BIN \ --kalign_binary_path $KALIGN_BIN
Submit this job using the Slurm sbatch command.
[user@biowulf]$ sbatch --cpus-per-task=8 --mem=32g --time=3:00:00 \ --gres=lscratch:50,gpu:1 \ --partition=gpu \ --constraint='gpup100|gpuv100|gpuv100x|gpua100' \ alphalink.sh