ProteinMPNN is a deep learning–based protein sequence design method. Unlike AlphaFold and Rosettafold, which both predict protein structures from sequence, ProteinMPNN tries to solve the inverse problem, to find a sequence that matches a protein backbone.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ interactive --gres=gpu:a100:1,lscratch:10 --mem=24g -c8 [user@cn0071 ~]$ module load ProteinMPNN [+] Loading singularity 4.1.5 on cn0071 [+] Loading ProteinMPNN 1.0.1 [user@cn0071 ~]$ protein_mpnn_run.py -h usage: protein_mpnn_run.py [-h] [--ca_only] [--path_to_model_weights PATH_TO_MODEL_WEIGHTS] [--model_name MODEL_NAME] [--seed SEED] [--save_score SAVE_SCORE] [--save_probs SAVE_PROBS] [--score_only SCORE_ONLY] [--conditional_probs_only CONDITIONAL_PROBS_ONLY] [--conditional_probs_only_backbone CONDITIONAL_PROBS_ONLY_BACKBONE] [--unconditional_probs_only UNCONDITIONAL_PROBS_ONLY] [--backbone_noise BACKBONE_NOISE] [--num_seq_per_target NUM_SEQ_PER_TARGET] [--batch_size BATCH_SIZE] [--max_length MAX_LENGTH] [--sampling_temp SAMPLING_TEMP] [--out_folder OUT_FOLDER] [--pdb_path PDB_PATH] [--pdb_path_chains PDB_PATH_CHAINS] [--jsonl_path JSONL_PATH] [--chain_id_jsonl CHAIN_ID_JSONL] [--fixed_positions_jsonl FIXED_POSITIONS_JSONL] [--omit_AAs OMIT_AAS] [--bias_AA_jsonl BIAS_AA_JSONL] [--bias_by_res_jsonl BIAS_BY_RES_JSONL] [--omit_AA_jsonl OMIT_AA_JSONL] [--pssm_jsonl PSSM_JSONL] [--pssm_multi PSSM_MULTI] [--pssm_threshold PSSM_THRESHOLD] [--pssm_log_odds_flag PSSM_LOG_ODDS_FLAG] [--pssm_bias_flag PSSM_BIAS_FLAG] [--tied_positions_jsonl TIED_POSITIONS_JSONL] options: -h, --help show this help message and exit --ca_only Parse CA-only structures and use CA-only models (default: false) (default: False) --path_to_model_weights PATH_TO_MODEL_WEIGHTS Path to model weights folder; (default: ) --model_name MODEL_NAME ProteinMPNN model name: v_48_002, v_48_010, v_48_020, v_48_030; v_48_010=version with 48 edges 0.10A noise (default: v_48_020) --seed SEED If set to 0 then a random seed will be picked; (default: 0) --save_score SAVE_SCORE 0 for False, 1 for True; save score=-log_prob to npy files (default: 0) --save_probs SAVE_PROBS 0 for False, 1 for True; save MPNN predicted probabilites per position (default: 0) --score_only SCORE_ONLY 0 for False, 1 for True; score input backbone-sequence pairs (default: 0) --conditional_probs_only CONDITIONAL_PROBS_ONLY 0 for False, 1 for True; output conditional probabilities p(s_i given the rest of the sequence and backbone) (default: 0) --conditional_probs_only_backbone CONDITIONAL_PROBS_ONLY_BACKBONE 0 for False, 1 for True; if true output conditional probabilities p(s_i given backbone) (default: 0) --unconditional_probs_only UNCONDITIONAL_PROBS_ONLY 0 for False, 1 for True; output unconditional probabilities p(s_i given backbone) in one forward pass (default: 0) --backbone_noise BACKBONE_NOISE Standard deviation of Gaussian noise to add to backbone atoms (default: 0.0) --num_seq_per_target NUM_SEQ_PER_TARGET Number of sequences to generate per target (default: 1) --batch_size BATCH_SIZE Batch size; can set higher for titan, quadro GPUs, reduce this if running out of GPU memory (default: 1) --max_length MAX_LENGTH Max sequence length (default: 200000) --sampling_temp SAMPLING_TEMP A string of temperatures, 0.2 0.25 0.5. Sampling temperature for amino acids. Suggested values 0.1, 0.15, 0.2, 0.25, 0.3. Higher values will lead to more diversity. (default: 0.1) --out_folder OUT_FOLDER Path to a folder to output sequences, e.g. /home/out/ (default: None) --pdb_path PDB_PATH Path to a single PDB to be designed (default: ) --pdb_path_chains PDB_PATH_CHAINS Define which chains need to be designed for a single PDB (default: ) --jsonl_path JSONL_PATH Path to a folder with parsed pdb into jsonl (default: None) --chain_id_jsonl CHAIN_ID_JSONL Path to a dictionary specifying which chains need to be designed and which ones are fixed, if not specied all chains will be designed. (default: ) --fixed_positions_jsonl FIXED_POSITIONS_JSONL Path to a dictionary with fixed positions (default: ) --omit_AAs OMIT_AAS Specify which amino acids should be omitted in the generated sequence, e.g. 'AC' would omit alanine and cystine. (default: X) --bias_AA_jsonl BIAS_AA_JSONL Path to a dictionary which specifies AA composion bias if neededi, e.g. {A: -1.1, F: 0.7} would make A less likely and F more likely. (default: ) --bias_by_res_jsonl BIAS_BY_RES_JSONL Path to dictionary with per position bias. (default: ) --omit_AA_jsonl OMIT_AA_JSONL Path to a dictionary which specifies which amino acids need to be omited from design at specifi chain indices (default: ) --pssm_jsonl PSSM_JSONL Path to a dictionary with pssm (default: ) --pssm_multi PSSM_MULTI A value between [0.0, 1.0], 0.0 means do not use pssm, 1.0 ignore MPNN predictions (default: 0.0) --pssm_threshold PSSM_THRESHOLD A value between -inf + inf to restric per position AAs (default: 0.0) --pssm_log_odds_flag PSSM_LOG_ODDS_FLAG 0 for False, 1 for True (default: 0) --pssm_bias_flag PSSM_BIAS_FLAG 0 for False, 1 for True (default: 0) --tied_positions_jsonl TIED_POSITIONS_JSONL Path to a dictionary with tied positions (default: ) [user@cn0071 ~]$ git clone https://github.com/dauparas/ProteinMPNN [user@cn0071 ~]$ protein_mpnn_run.py \ --pdb_path ProteinMPNN/inputs/PDB_complexes/pdbs/3HTN.pdb \ --out_folder ./3HTN_designs/ \ --num_seq_per_target 10 \ --sampling_temp "0.1" \ --seed 0 \ --batch_size 1 \ --model_name v_48_020 ---------------------------------------- chain_id_jsonl is NOT loaded ---------------------------------------- fixed_positions_jsonl is NOT loaded ---------------------------------------- pssm_jsonl is NOT loaded ---------------------------------------- omit_AA_jsonl is NOT loaded ---------------------------------------- bias_AA_jsonl is NOT loaded ---------------------------------------- tied_positions_jsonl is NOT loaded ---------------------------------------- bias by residue dictionary is not loaded, or not provided ---------------------------------------- ---------------------------------------- Number of edges: 48 Training noise level: 0.2A Generating sequences for: 3HTN 10 sequences of length 429 generated in 9.4706 seconds [user@cn0071 ~]$ dl_interface_design.py -h ┌──────────────────────────────────────────────────────────────────────────────┐ │ PyRosetta-4 │ │ Created in JHU by Sergey Lyskov and PyRosetta Team │ │ (C) Copyright Rosetta Commons Member Institutions │ │ │ │ NOTE: USE OF PyRosetta FOR COMMERCIAL PURPOSES REQUIRE PURCHASE OF A LICENSE │ │ See LICENSE.PyRosetta.md or email license@uw.edu for details │ └──────────────────────────────────────────────────────────────────────────────┘ PyRosetta-4 2024 [Rosetta PyRosetta4.Release.python39.linux 2024.39+release.59628fbc5bc09f1221e1642f1f8d157ce49b1410 2024-09-23T07:49:48] retrieved from: http://www.pyrosetta.org usage: dl_interface_design.py [-h] [-pdbdir PDBDIR] [-silent SILENT] [-outpdbdir OUTPDBDIR] [-outsilent OUTSILENT] [-runlist RUNLIST] [-checkpoint_name CHECKPOINT_NAME] [-debug] [-relax_cycles RELAX_CYCLES] [-output_intermediates] [-seqs_per_struct SEQS_PER_STRUCT] [-checkpoint_path CHECKPOINT_PATH] [-temperature TEMPERATURE] [-augment_eps AUGMENT_EPS] [-protein_features PROTEIN_FEATURES] [-omit_AAs OMIT_AAS] [-bias_AA_jsonl BIAS_AA_JSONL] [-num_connections NUM_CONNECTIONS] optional arguments: -h, --help show this help message and exit -pdbdir PDBDIR The name of a directory of pdbs to run through the model -silent SILENT The name of a silent file to run through the model -outpdbdir OUTPDBDIR The directory to which the output PDB files will be written, used if the -pdbdir arg is active -outsilent OUTSILENT The name of the silent file to which output structs will be written, used if the -silent arg is active -runlist RUNLIST The path of a list of pdb tags to run, only active when the -pdbdir arg is active (default: ''; Run all PDBs) -checkpoint_name CHECKPOINT_NAME The name of a file where tags which have finished will be written (default: check.point) -debug When active, errors will cause the script to crash and the error message to be printed out (default: False) -relax_cycles RELAX_CYCLES The number of relax cycles to perform on each structure (default: 1) -output_intermediates Whether to write all intermediate sequences from the relax cycles to disk (default: False) -seqs_per_struct SEQS_PER_STRUCT The number of sequences to generate for each structure (default: 1) -checkpoint_path CHECKPOINT_PATH The path to the ProteinMPNN weights you wish to use, default /opt/dl_binder_design/mpnn_fr/ProteinMPNN/vanilla_model_weights/v_48_020.pt -temperature TEMPERATURE The sampling temperature to use when running ProteinMPNN (default: 0.000001) -augment_eps AUGMENT_EPS The variance of random noise to add to the atomic coordinates (default 0) -protein_features PROTEIN_FEATURES What type of protein features to input to ProteinMPNN (default: full) -omit_AAs OMIT_AAS A string of all residue types (one letter case-insensitive) that you would not like to use for design. Letters not corresponding to residue types will be ignored (default: CX) -bias_AA_jsonl BIAS_AA_JSONL The path to a JSON file containing a dictionary mapping residue one-letter names to the bias for that residue eg. {A: -1.1, F: 0.7} (default: ; no bias) -num_connections NUM_CONNECTIONS Number of neighbors each residue is connected to. Do not mess around with this argument unless you have a specific set of ProteinMPNN weights which expects a different number of connections. (default: 48) [user@cn0071 ~]$ dl_interface_design.py -silent ./dl_binder_design/examples/inputs/proteinmpnn_output.silent ┌──────────────────────────────────────────────────────────────────────────────┐ │ PyRosetta-4 │ │ Created in JHU by Sergey Lyskov and PyRosetta Team │ │ (C) Copyright Rosetta Commons Member Institutions │ │ │ │ NOTE: USE OF PyRosetta FOR COMMERCIAL PURPOSES REQUIRE PURCHASE OF A LICENSE │ │ See LICENSE.PyRosetta.md or email license@uw.edu for details │ └──────────────────────────────────────────────────────────────────────────────┘ PyRosetta-4 2024 [Rosetta PyRosetta4.Release.python39.linux 2024.39+release.59628fbc5bc09f1221e1642f1f8d157ce49b1410 2024-09-23T07:49:48] retrieved from: http://www.pyrosetta.org Found GPU will run ProteinMPNN on GPU Attempting pose: design_ppi_0_dldesign_0 ProteinMPNN generated 1 sequences in 4 seconds Running FastRelax Completed one cycle of FastRelax in 44.01047921180725 seconds ProteinMPNN generated 1 sequences in 0 seconds Struct: design_ppi_0_dldesign_0 reported success in 48 seconds Attempting pose: design_ppi_0_dldesign_1 ProteinMPNN generated 1 sequences in 0 seconds Running FastRelax Completed one cycle of FastRelax in 42.61138868331909 seconds ProteinMPNN generated 1 sequences in 0 seconds Struct: design_ppi_0_dldesign_1 reported success in 43 seconds Attempting pose: design_ppi_0_dldesign_2 ProteinMPNN generated 1 sequences in 0 seconds Running FastRelax Completed one cycle of FastRelax in 45.7072536945343 seconds ProteinMPNN generated 1 sequences in 0 seconds ... Struct: design_ppi_9_dldesign_2 reported success in 81 seconds Attempting pose: design_ppi_9_dldesign_3 ProteinMPNN generated 1 sequences in 0 seconds Running FastRelax Completed one cycle of FastRelax in 63.77409362792969 seconds ProteinMPNN generated 1 sequences in 0 seconds Struct: design_ppi_9_dldesign_3 reported success in 65 seconds [user@cn0071 ~]$ af2_predict.py -h ┌──────────────────────────────────────────────────────────────────────────────┐ │ PyRosetta-4 │ │ Created in JHU by Sergey Lyskov and PyRosetta Team │ │ (C) Copyright Rosetta Commons Member Institutions │ │ │ │ NOTE: USE OF PyRosetta FOR COMMERCIAL PURPOSES REQUIRE PURCHASE OF A LICENSE │ │ See LICENSE.PyRosetta.md or email license@uw.edu for details │ └──────────────────────────────────────────────────────────────────────────────┘ PyRosetta-4 2024 [Rosetta PyRosetta4.Release.python311.linux 2024.39+release.59628fbc5bc09f1221e1642f1f8d157ce49b1410 2024-09-23T07:49:48] retrieved from: http://www.pyrosetta.org usage: predict.py [-h] [-pdbdir PDBDIR] [-silent SILENT] [-outpdbdir OUTPDBDIR] [-outsilent OUTSILENT] [-runlist RUNLIST] [-checkpoint_name CHECKPOINT_NAME] [-scorefilename SCOREFILENAME] [-maintain_res_numbering] [-debug] [-max_amide_dist MAX_AMIDE_DIST] [-recycle RECYCLE] [-no_initial_guess] [-force_monomer] options: -h, --help show this help message and exit -pdbdir PDBDIR The name of a directory of pdbs to run through the model -silent SILENT The name of a silent file to run through the model -outpdbdir OUTPDBDIR The directory to which the output PDB files will be written. Only used when -pdbdir is active -outsilent OUTSILENT The name of the silent file to which output structs will be written. Only used when -silent is active -runlist RUNLIST The path of a list of pdb tags to run. Only used when -pdbdir is active (default: ''; Run all PDBs) -checkpoint_name CHECKPOINT_NAME The name of a file where tags which have finished will be written (default: check.point) -scorefilename SCOREFILENAME The name of a file where scores will be written (default: out.sc) -maintain_res_numbering When active, the model will not renumber the residues when bad inputs are encountered (default: False) -debug When active, errors will cause the script to crash and the error message to be printed out (default: False) -max_amide_dist MAX_AMIDE_DIST The maximum distance between an amide bond's carbon and nitrogen (default: 3.0) -recycle RECYCLE The number of AF2 recycles to perform (default: 3) -no_initial_guess When active, the model will not use an initial guess (default: False) -force_monomer When active, the model will predict the structure of a monomer (default: False) [user@cn0071 ~]$ cp -r $PDB_DATA . [user@cn0071 ~]$ mkdir -p output_dir [user@cn0071 ~]$ af2_predict.py -pdbdir $PROTEINMPNN_DATA -outpdbdir output_dir ... [user@cn0071 ~]$End the interactive session:
[user@cn0071 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$