AMPL on Biowulf
AMPL is the Accelerating Therapeutics for Opportunites in Medicine (ATOM) Consortium Modeling PipeLine for drug discovery.
Reference:
Documentation
Important Notes
- Module Name: ampl (see the modules page for more information)
- The ampl module sets the $AMPL_HOME environment variable which points to the location of the installation. A matched version of the GitHub repo is present in this directory for copying. If you want to modify ampl scripts before running, you should copy this directory to your own space.
- AMPL is installed within a pre-made conda environment. All binaries in the original environment are symlinked into your path (it is mostly overwriting your existing Python). Do not attempt to use simultaneously with another Python environment.
- If you find that your sys.path contains libraries within your home directory that conflict with the central installation of ampl, you can start python with the -Es flags, or execute export PYTHONNOUSERSITE=True.
- The following commands and scripts are avaiable after loading the module. Please note that some of these scripts may not be executable. Please contact the NIH HPC staff if other commands are required within the AMPL environment:
- ave_splitter.py
- chem_diversity.py
- compare_models.py
- curate_data.py
- data_curation_functions.py
- datastore_functions.py
- dist_metrics.py
- diversity_plots.py
- featurization.py
- genTestset.py
- hyperparam_search_wrapper.pec/ampl.sh
- hyper_perf_plots.py
- ipython
- jupyter
- jupyter-notebook
- jupyter-run
- llnl_utils.py
- model_datasets.py
- model_pipeline.py
- model_tracker.py
- model_wrapper.py
- nosetests
- open-docs.py
- parameter_parser.py
- perf_data.py
- perf_plots.py
- process_slurm.py
- pubchem_utils.py
- pytest
- python
- rdkit_easy.py
- splitting.py
- struct_utils.py
- temporal_splitter.py
- tensorboard
- transformations.py
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf ~]$ sinteractive --gres=lscratch:10 --mem=8g -c4 --tunnel salloc: Pending job allocation 35296239 salloc: job 35296239 queued and waiting for resources salloc: job 35296239 has been allocated resources salloc: Granted job allocation 35296239 salloc: Waiting for resource configuration salloc: Nodes cn0857 are ready for job srun: error: x11: no local DISPLAY defined, skipping error: unable to open file /tmp/slurm-spank-x11.35296239.0 slurmstepd: error: x11: unable to read DISPLAY value Created 1 generic SSH tunnel(s) from this compute node to biowulf for your use at port numbers defined in the $PORTn ($PORT1, ...) environment variables. Please create a SSH tunnel from your workstation to these ports on biowulf. On Linux/MacOS, open a terminal and run: ssh -L 36858:localhost:36858 user@biowulf.nih.gov For Windows instructions, see https://hpc.nih.gov/docs/tunneling [user@cn0857 ~]$ module load ampl [+] Loading ampl 1.3.0 on cn0857 [user@cn0857 ~]$ ipython Python 3.7.10 (default, Feb 26 2021, 18:47:35) Type 'copyright', 'credits' or 'license' for more information IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: import atomsci In [2]: quit() [user@cn0857 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn0857 35296239]$ cp -r ${AMPL_HOME}/ampl . [user@cn0857 35296239]$ cd ampl/atomsci/ddm/ [user@cn0857 ddm]$ jupyter-notebook --no-browser --port=$PORT1 [I 14:23:09.610 NotebookApp] Serving notebooks from local directory: /lscratch/35296239/ampl/atomsci/ddm [I 14:23:09.610 NotebookApp] Jupyter Notebook 6.4.10 is running at: [I 14:23:09.610 NotebookApp] http://localhost:36858/?token=5ebfcff3cfccd637d36250f9a5a8191a22011edd2a262db5 [I 14:23:09.610 NotebookApp] or http://127.0.0.1:36858/?token=5ebfcff3cfccd637d36250f9a5a8191a22011edd2a262db5 [I 14:23:09.610 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 14:23:09.618 NotebookApp] To access the notebook, open this file in a browser: file:///spin1/home/linux/user/.local/share/jupyter/runtime/nbserver-63375-open.html Or copy and paste one of these URLs: http://localhost:36858/?token=5ebfcff3cfccd637d36250f9a5a8191a22011edd2a262db5 or http://127.0.0.1:36858/?token=5ebfcff3cfccd637d36250f9a5a8191a22011edd2a262db5
Now in a new terminal on your local machine, you should initiate a new ssh session to Biowulf to establish the ssh tunnel. Using the port that was allocated to you during the sinteractive step above:
[user@mymachine ~]$ ssh -L 36858:localhost:36858 user@biowulf.nih.gov [user@biowulf ~]$
This will allow you to open a local browser and view the Jupyter Notebook session running on the compute node:
^C[I 11:48:01.862 NotebookApp] interrupted Serving notebooks from local directory: /lscratch/1151805/ampl/atomsci/ddm 1 active kernel The Jupyter Notebook is running at: http://localhost:36858/?token=05982663a539c9fac05608e4b7ca76649356f1cdb3dcc750 Shutdown this notebook server (y/[n])? y [C 18:43:10.271 NotebookApp] Shutdown confirmed [I 11:48:03.637 NotebookApp] Shutting down 1 kernel [I 11:48:03.838 NotebookApp] Kernel shutdown: 1b0c8fe9-a2de-45bc-ab5a-bea16e60dc0f [user@cn0859 ddm]$ exit exit salloc.exe: Relinquishing job allocation 1151805
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. ampl.sh). For example:
#!/bin/bash set -e module load ampl # command here
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] ampl.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.
Create a swarmfile (e.g. ampl.swarm). For example:
cmd1 cmd2 cmd3 cmd4
Submit this job using the swarm command.
swarm -f ampl.swarm [-g #] [-t #] --module amplwhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module ampl | Loads the ampl module for each subjob in the swarm |