heme_binder_diffusion: de novo heme binding protein design pipeline using RFdiffusionAA

heme_binder_diffusion: de novo heme binding
protein design pipeline using RFdiffusionAA

Quick Links

RoseTTAFold All-Atom (RFAA), a deep network capable of modeling full biological assemblies containing proteins, nucleic acids, small molecules, metals, and covalent modifications given the sequences of the polymers and the atomic bonded geometry of the small molecules and covalent modifications. The heme_binder_diffusion pipeline employs RoseTTAFold All-Atom to perform de novo heme binding protein design.

Reference:

Rohith Krishna, Jue Wang, Woody Ahern, Pascal Sturmfels, Preetham Venkatesh, Indrek Kalvet, Gyu Rie Lee, Felix S Morey-Burrows, Ivan Anishchenko, Ian R Humphreys, Ryan McHugh, Dionne Vafeados, Xinting Li, George A Sutherland, Andrew Hitchcock, C Neil Hunter, Minkyung Baek, Frank DiMaio, David Baker,
Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom,
bioRxiv preprint doi: https://doi.org/10.1101/2023.10.09.561603;

Introducing All-Atom versions of RoseTTAFold and RFdiffusion

Documentation

Important Notes

Module Name: heme_binder_diffusion (see the modules page for more information)
Unusual environment variables set
- HBD_HOME installation directory
- HBD_BIN executable directory
- HBD_SRC source code directory
- HBD_DATA sample data directory

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=20g -c8 --gres=gpu:k80:1,lscratch:10
[user@cn3335 ~]$ module load heme_binder_diffusion 
[+] Loading git 2.39.2  ...
[+] Loading jupyter
[+] Loading apptainer  1.1.6  on cn3335
[+] Loading heme_binder_diffusion  20240319

To test the RFdiffusion-AA application:

[user@cn3335 ~]$ git clone https://github.com/baker-laboratory/rf_diffusion_all_atom
[user@cn3335 ~]$ git clone https://github.com/baker-laboratory/RoseTTAFold-All-Atom 
[user@cn3335 ~]$ cd rf_diffusion_all_atom
[user@cn3335 ~]$ rm -rf rf2aa && ln -s ../RoseTTAFold-All-Atom/rf2aa
[user@cn3335 ~]$ wget http://files.ipd.uw.edu/pub/RF-All-Atom/weights/RFDiffusionAA_paper_weights.pt
[user@cn3335 ~]$ run_aa  -u run_inference.py inference.deterministic=True diffuser.T=100  \
                              inference.output_prefix=output/ligand_only/sample  \
                              inference.input_pdb=input/7v11.pdb  \
                              contigmap.contigs=[\'150-150\']  \
                              inference.ligand=OQO
...
Calculating chi_beta_T dictionary...
Done calculating chi_beta_T dictionaries. They are now cached.
Done calculating chi_beta_T, chi_alphas_T, and chi_abars_T dictionaries.
[2024-03-14 10:25:36,292][inference.model_runners][INFO] - 10:25:36: Timestep 100, current sequence: ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
[2024-03-14 10:25:37,598][inference.model_runners][INFO] - 10:25:37: Timestep 99, current sequence: ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
...
SA?AS??S??K?NASAA?SAAA??AA?SAA?AA?A?S?K?KA?AA?SSAAAA?AAA?Saaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
[2024-03-14 10:47:36,968][inference.model_runners][INFO] - 10:47:36: Timestep 2, current sequence: ?ASAS?SASA?S??SASAAA???SAAAA???S?SSA???S???K?Q??A???K?K???????S???A??SA????R?ASA?A????A?AAA?SA?AS?AS??K?KASSAASAAA??AA??AA?AA?A?S?K?KS?AA?ASAAASAAAA?Saaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
[2024-03-14 10:47:38,249][inference.model_runners][INFO] - 10:47:38: Timestep 1, current sequence: ?ASAS?SASAAS??SASAAS??ASASA??A?S?ASA???S???K?Q??A???K?K????A??S???S??SA????R?ASA?A??A???SAA?SA?AA??S??K?K?SAAASAAAA?AAASAA?AA?A?S?K?KS?AS?AAAAASAAAA?Saaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
[2024-03-14 10:47:38,251][__main__][INFO] - Finished design in 2.14 minutes
[2024-03-14 10:47:42,543][__main__][INFO] - design : ./output/ligand_only/sample_9.pdb
[2024-03-14 10:47:42,543][__main__][INFO] - Xt traj: ./output/ligand_only/traj/sample_9_Xt-1_traj.pdb
[2024-03-14 10:47:42,543][__main__][INFO] - X0 traj: ./output/ligand_only/traj/sample_9_pX0_traj.pdb

End the interactive session:

[user@cn3335 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226

To run the heme_binder_diffusion (HBD) pipeline:

[user@biowulf]$ sinteractive --mem=20g -c8 --gres=gpu:k80:1,lscratch:10 --tunnel
...
Please create a SSH tunnel from your workstation to these ports on biowulf.
On Linux/MacOS, open a terminal and run:

    ssh  -L 44945:localhost:44945 denisovga@biowulf.nih.gov
[user@cn4292]

Store the PORT1 number (in this case 44945) and the SLURM_NODELIST (in this case cn4292).

[user@cn4292 ~]$ cd /lscratch/$SLURM_JOB_ID
[user@cn4292 ~]$ module load HBD 
[user@cn4292 ~]$ cp -r $HBD_SRC .
[user@cn4292 ~]$ cd heme_binder_diffusion
[user@cn4292 ~]$ jupyter notebook  --no-browser --port $PORT1
...

Store the URL produced by the latter command. Then, on your local system/computer, open 2nd window and type:

PORT1=44945
SLURM_NODELIST=cn4292
ssh -t -L $PORT1:$localhost:$PORT1 biowulf "ssh -L $PORT1:$localhost:$PORT1 $SLURM_JOB_NODELIST"

After loggin in, paste the URL you stored into a browser on your local system.

Click on "pypeline.ipnb" link and start running the pipeline. While running, replace everywhere the "user" string with your Biowulf user ID.

[user@cn4292 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$