heme_binder_diffusion: de novo heme binding
protein design pipeline using RFdiffusionAA
protein design pipeline using RFdiffusionAA
RoseTTAFold All-Atom (RFAA), a deep network capable of modeling full biological assemblies containing proteins, nucleic acids, small molecules, metals, and covalent modifications given the sequences of the polymers and the atomic bonded geometry of the small molecules and covalent modifications. The heme_binder_diffusion pipeline employs RoseTTAFold All-Atom to perform de novo heme binding protein design.
Reference:
- Rohith Krishna, Jue Wang, Woody Ahern, Pascal Sturmfels, Preetham Venkatesh, Indrek Kalvet,
Gyu Rie Lee, Felix S Morey-Burrows, Ivan Anishchenko, Ian R Humphreys, Ryan McHugh, Dionne Vafeados,
Xinting Li, George A Sutherland, Andrew Hitchcock, C Neil Hunter, Minkyung Baek, Frank DiMaio, David Baker,
Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom,
bioRxiv preprint doi: https://doi.org/10.1101/2023.10.09.561603; - Introducing All-Atom versions of RoseTTAFold and RFdiffusion
Documentation
- De novo heme binding protein design pipeline using RFdiffusionAA
- heme_binder_diffusion on Github
- RFdiffusion AA on Github
- Code for RoseTTAFold All-Atom on Github
- RFDesign on Github
- ProteinMPNN on Github
- LigandMPNN on Github
Important Notes- Module Name: heme_binder_diffusion (see the modules page for more information)
- Unusual environment variables set
- HBD_HOME installation directory
- HBD_BIN executable directory
- HBD_SRC source code directory
- HBD_DATA sample data directory
Interactive jobInteractive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=20g -c8 --gres=gpu:k80:1,lscratch:10 [user@cn3335 ~]$ module load heme_binder_diffusion [+] Loading git 2.39.2 ... [+] Loading jupyter [+] Loading apptainer 1.1.6 on cn3335 [+] Loading heme_binder_diffusion 20240319
To test the RFdiffusion-AA application:[user@cn3335 ~]$ git clone https://github.com/baker-laboratory/rf_diffusion_all_atom [user@cn3335 ~]$ git clone https://github.com/baker-laboratory/RoseTTAFold-All-Atom [user@cn3335 ~]$ cd rf_diffusion_all_atom [user@cn3335 ~]$ rm -rf rf2aa && ln -s ../RoseTTAFold-All-Atom/rf2aa [user@cn3335 ~]$ wget http://files.ipd.uw.edu/pub/RF-All-Atom/weights/RFDiffusionAA_paper_weights.pt [user@cn3335 ~]$ run_aa -u run_inference.py inference.deterministic=True diffuser.T=100 \ inference.output_prefix=output/ligand_only/sample \ inference.input_pdb=input/7v11.pdb \ contigmap.contigs=[\'150-150\'] \ inference.ligand=OQO ... Calculating chi_beta_T dictionary... Done calculating chi_beta_T dictionaries. They are now cached. Done calculating chi_beta_T, chi_alphas_T, and chi_abars_T dictionaries. [2024-03-14 10:25:36,292][inference.model_runners][INFO] - 10:25:36: Timestep 100, current sequence: ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa [2024-03-14 10:25:37,598][inference.model_runners][INFO] - 10:25:37: Timestep 99, current sequence: ??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ... SA?AS??S??K?NASAA?SAAA??AA?SAA?AA?A?S?K?KA?AA?SSAAAA?AAA?Saaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa [2024-03-14 10:47:36,968][inference.model_runners][INFO] - 10:47:36: Timestep 2, current sequence: ?ASAS?SASA?S??SASAAA???SAAAA???S?SSA???S???K?Q??A???K?K???????S???A??SA????R?ASA?A????A?AAA?SA?AS?AS??K?KASSAASAAA??AA??AA?AA?A?S?K?KS?AA?ASAAASAAAA?Saaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa [2024-03-14 10:47:38,249][inference.model_runners][INFO] - 10:47:38: Timestep 1, current sequence: ?ASAS?SASAAS??SASAAS??ASASA??A?S?ASA???S???K?Q??A???K?K????A??S???S??SA????R?ASA?A??A???SAA?SA?AA??S??K?K?SAAASAAAA?AAASAA?AA?A?S?K?KS?AS?AAAAASAAAA?Saaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa [2024-03-14 10:47:38,251][__main__][INFO] - Finished design in 2.14 minutes [2024-03-14 10:47:42,543][__main__][INFO] - design : ./output/ligand_only/sample_9.pdb [2024-03-14 10:47:42,543][__main__][INFO] - Xt traj: ./output/ligand_only/traj/sample_9_Xt-1_traj.pdb [2024-03-14 10:47:42,543][__main__][INFO] - X0 traj: ./output/ligand_only/traj/sample_9_pX0_traj.pdb
End the interactive session:[user@cn3335 ~]$ exit salloc.exe: Relinquishing job allocation 46116226
To run the heme_binder_diffusion (HBD) pipeline:[user@biowulf]$ sinteractive --mem=20g -c8 --gres=gpu:k80:1,lscratch:10 --tunnel ... Please create a SSH tunnel from your workstation to these ports on biowulf. On Linux/MacOS, open a terminal and run: ssh -L 44945:localhost:44945 denisovga@biowulf.nih.gov [user@cn4292]
Store the PORT1 number (in this case 44945) and the SLURM_NODELIST (in this case cn4292).[user@cn4292 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn4292 ~]$ module load HBD [user@cn4292 ~]$ cp -r $HBD_SRC . [user@cn4292 ~]$ cd heme_binder_diffusion [user@cn4292 ~]$ jupyter notebook --no-browser --port $PORT1 ...
Store the URL produced by the latter command. Then, on your local system/computer, open 2nd window and type:PORT1=44945 SLURM_NODELIST=cn4292 ssh -t -L $PORT1:$localhost:$PORT1 biowulf "ssh -L $PORT1:$localhost:$PORT1 $SLURM_JOB_NODELIST"
After loggin in, paste the URL you stored into a browser on your local system.
Click on "pypeline.ipnb" link and start running the pipeline. While running, replace everywhere the "user" string with your Biowulf user ID.
[user@cn4292 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$