Biowulf High Performance Computing at the NIH
rosettafold on Biowulf

Accurate prediction of protein structures and interactions using a three-track neural network, in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive --cpus-per-task=10 --mem=60G
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ module load RoseTTAFold
[user@cn3144]$ mkdir /data/$USER/rosettafold_test/
[user@cn3144]$ cd /data/$USER/rosettafold_test/
[user@cn3144]$ cp -r ${ROSETTAFOLD_TEST_DATA:-none}/* .
[user@cn3144]$ run_e2e_ver_part1.sh input.fa e2e_out
Running HHblits
Running PSIPRED
Running hhsearch
Running end-to-end prediction
Done with part1, please run part2 on GPU node

[user@cn3144]$ run_pyrosetta_ver_part1.sh input.fa pyrosetta_out
Running HHblits
Running PSIPRED
Running hhsearch
Predicting distance and orientations
Running parallel RosettaTR.py
Done with part1, please run part2 at GPU node

[user@cn3144 ]$ exit
salloc.exe: Relinquishing job allocation 46116226

[user@biowulf]$ sinteractive --cpus-per-task=2 --mem=10g --gres=gpu:p100:1
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ module load RoseTTAFold
[user@cn3144]$ cd /data/$USER/rosettafold_test/
[user@cn3144]$ run_e2e_ver_part2.sh input.fa e2e_out
run_e2e_ver_part2.sh input.fa e2e_out
Running end-to-end prediction
Done with part2 (prediction)

[user@cn3144]$ run_pyrosetta_ver_part2.sh input.fa pyrosetta_out
Picking final models
Final models saved in: pyrosetta_out/model
Done with part2 (pick final models)

For PPI screening using faster 2-track version:

[user@biowulf]$ sinteractive --cpus-per-task=2 --mem=10g --gres=gpu:p100:1
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ module load RoseTTAFold
[user@cn3144]$ mkdir /data/$USER/rosettafold_test/
[user@cn3144]$ cd /data/$USER/rosettafold_test/
[user@cn3144]$ cp -r ${ROSETTAFOLD_TEST_DATA:-none}/* .
[user@cn3144]$ cd complex_2track
[user@cn3144]$ python ~/network_2track/predict_msa.py -msa input.a3m -npz complex_2track.npz -L1 218

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. rosettafold.sh). For example:


#!/bin/bash
set -e
module load RoseTTAFold
cd /data/$USER/rosettafold_test/
cp -r ${ROSETTAFOLD_TEST_DATA:-none}/* .
cd complex_modeling
python ~/network/predict_complex.py -i paired.a3m -o complex3 -Ls 218 310

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=2 --mem=10g --partition=gpu --gres=gpu:v100x:1 rosettafold.sh