rosettafold on Biowulf
Accurate prediction of protein structures and interactions using a three-track neural network, in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated.
References:
- Baek M et al.Accurate prediction of protein structures and interactions using a three-track neural network Science. 2021 Jul 15 PubMed | Journal
Documentation
Important Notes
- Module Name: RoseTTAFold (see the modules page for more information)
- RoseTTAFold/allatom are using Hydra to compse config files, so please save a local copy of config at your data directory after you load the module and follow the instructions bellow.
cp -r ${RFAA_CONF:-none} /data/$USER/
- RoseTTAFold/1.1.0 needs write permission to the model, so please save a local copy of network at your home directory after you load the module the first time.
module load RoseTTAFold cp -r ${ROSETTAFOLD_NETWORK:-none} ~/
- To run complex modeling, you also needs to save a local copy of weights at your home directory.
cp -r ${ROSETTAFOLD_WEIGHTS:-none} ~/
- For PPI screening using faster 2-track version (only available for RoseTTAFold/1.1.0), you need to copy a different network to home directory.
cp -r ${ROSETTAFOLD_NETWORK_2TRACK:-none} ~/
- Since only the last step of run_e2e_ver.sh and run_pyrosetta_ver.sh can use GPU, we strongly suggest to run the edited pipeline which was spited to part1(CPUs and memory heavy) and part2(GPU), see the examples in interactive job.
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program.
Sample session (user input in bold):
- RoseTTAFold/1.1.0
- RoseTTAFold/allatom
[user@biowulf]$ sinteractive --cpus-per-task=10 --mem=60G salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ module load RoseTTAFold [user@cn3144]$ mkdir /data/$USER/rosettafold_test/ [user@cn3144]$ cd /data/$USER/rosettafold_test/ [user@cn3144]$ cp -r ${ROSETTAFOLD_TEST_DATA:-none}/* . [user@cn3144]$ run_e2e_ver_part1.sh input.fa e2e_out Running HHblits Running PSIPRED Running hhsearch Running end-to-end prediction Done with part1, please run part2 on GPU node [user@cn3144]$ run_pyrosetta_ver_part1.sh input.fa pyrosetta_out Running HHblits Running PSIPRED Running hhsearch Predicting distance and orientations Running parallel RosettaTR.py Done with part1, please run part2 at GPU node [user@cn3144 ]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$ sinteractive --cpus-per-task=2 --mem=10g --gres=gpu:p100:1 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ module load RoseTTAFold [user@cn3144]$ cd /data/$USER/rosettafold_test/ [user@cn3144]$ run_e2e_ver_part2.sh input.fa e2e_out run_e2e_ver_part2.sh input.fa e2e_out Running end-to-end prediction Done with part2 (prediction) [user@cn3144]$ run_pyrosetta_ver_part2.sh input.fa pyrosetta_out Picking final models Final models saved in: pyrosetta_out/model Done with part2 (pick final models)
For PPI screening using faster 2-track version:
[user@biowulf]$ sinteractive --cpus-per-task=2 --mem=10g --gres=gpu:p100:1 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ module load RoseTTAFold [user@cn3144]$ mkdir /data/$USER/rosettafold_test/ [user@cn3144]$ cd /data/$USER/rosettafold_test/ [user@cn3144]$ cp -r ${ROSETTAFOLD_TEST_DATA:-none}/* . [user@cn3144]$ cd complex_2track [user@cn3144]$ python ~/network_2track/predict_msa.py -msa input.a3m -npz complex_2track.npz -L1 218
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. rosettafold.sh). For example:
#!/bin/bash
set -e
module load RoseTTAFold
cd /data/$USER/rosettafold_test/
cp -r ${ROSETTAFOLD_TEST_DATA:-none}/* .
cd complex_modeling
python ~/network/predict_complex.py -i paired.a3m -o complex3 -Ls 218 310
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=2 --mem=10g --partition=gpu --gres=gpu:v100x:1 rosettafold.sh
[user@biowulf]$ sinteractive --cpus-per-task=4 --mem=16g --gres=gpu:p100:1 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ module load RoseTTAFold/allatom [user@cn3144]$ cd /data/$USER/ [user@cn3144]$ cp -r ${ROSETTAFOLD_TEST_DATA:-none} . [user@cn3144]$ python -m rf2aa.run_inference --config-name protein [user@cn3144]$ cp -r ${RFAA_CONF:-none} . # cp config and modify to use custmized input [user@cn3144]$ python -m rf2aa.run_inference \ --config-name protein \ --config-path /data/$USER/config/inference
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. rosettafold.sh). For example:
#!/bin/bash
set -e
module load RoseTTAFold/allatom
cd /data/$USER/rosettafold_test/
cp -r ${ROSETTAFOLD_TEST_DATA:-none} .
python -m rf2aa.run_inference --config-name nucleic_acid
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=2 --mem=10g --partition=gpu --gres=gpu:v100x:1 rosettafold.sh