Biowulf High
                    Performance Computing at the NIH
CANDLE on Biowulf

CANDLE (CANcer Distributed Learning Environment) is an open-source software platform providing deep learning methodologies that scales very efficiently on the world’s fastest supercomputers. Developed initially to address three top challenges facing the cancer community, CANDLE increasingly can be used to tackle problems in other application areas. The SDSI team at the Frederick National Laboratory for Cancer Research, sponsored by the National Cancer Institute, has recently installed CANDLE on NIH’s Biowulf supercomputer for all to use.

One of CANDLE's strongest attributes is its functionality for performing hyperparameter optimization (HPO). In a machine/deep learning model, "hyperparameters" refer to any variables that define the model aside from the model’s "weights." For a given set of hyperparameters (typically 5-20), the corresponding model’s weights (typically tens of thousands) are iteratively optimized using algorithms such as gradient descent. Such optimization of the model’s weights – a process called "training" – is typically run very efficiently on graphics processing units (GPUs) and typically takes 30 minutes to a couple of days.

If a measure of loss is assigned to each model trained on the same set of data, we would like to ultimately choose the model (i.e., set of hyperparameters) that best fits that dataset by minimizing the loss. HPO is this process of choosing the best set of hyperparameters. The most common way of determining the optimal set of hyperparameters is to run one training job for every desired combination of hyperparameters and choose that which produces the lowest loss. Such a workflow is labeled in CANDLE by "grid" (it is called in other contexts "grid search"). Another way of determining the optimal set of hyperparameters is to use a Bayesian approach in which information about how well prior sets of hyperparameters performed is used to select the next sets of hyperparameters to try. This type of workflow is labeled in CANDLE by "bayesian".

HPO need not be used for only machine/deep learning applications; it can be applied to any computational pipeline that can be parametrized by a number of settings. With ever-increasing amounts of data, applications like these, in addition to machine/deep learning applications, are growing at NCI and in the greater NIH community. If HPO is performed, better models for describing relationships between data can be found, and the better the model, the more accurate predictions can be made given new sets of data. CANDLE is here to help with this, and this webpage serves as a complete guide for running CANDLE on Biowulf.

12/8/19: Click here for a step-by-step guide to running HPO on your own model with CANDLE.

Why Use CANDLE?

Why use CANDLE in the first place? For example, why not just submit a swarm of jobs, each using a different set of hyperparameters?

Quick Start

These steps will get you running a sample CANDLE job on Biowulf right away!

Step 1: Set up your environment

Once logged in to Biowulf, set up your environment by creating and entering a working directory in your /data/$USER (not /home/$USER) directory and loading the candle module (user input in bold):

[user@biowulf]$ mkdir /data/$USER/candle
[user@biowulf]$ cd /data/$USER/candle
[user@biowulf]$ module load candle

Step 2: Copy a template submission script to the working directory

Copy one of the three CANDLE templates to the working directory:

[user@biowulf]$ candle import-template <TEMPLATE>

Possible values of <TEMPLATE> are:

grid Grid search using a Python model (simple deep neural network on the MNIST dataset; ~3 min. total runtime)
bayesian Bayesian search using a Python model (one of the JDACS4C Pilot 1 models, a 1D convolutional network for classifying RNA-Seq gene expression profiles into normal or tumor tissue categories; ~24 min. total runtime)
r Grid search using an R model (feature reduction on the TNBC dataset; ~6 min. total runtime)

Step 3: Run the job

Submit the job by running:

[user@biowulf]$ candle submit-job <TEMPLATE>_example.in

Summary of How to Use CANDLE

This section contains a summary of steps for running your own CANDLE job, which are detailed in the following sections.

Adapting Your Model to Work With CANDLE

You can run a CANDLE hyperparameter optimization (HPO) on your own machine/deep learning model or general workflow (generally called a "model script") by performing two minimal modifications to your model script. For HPO, CANDLE accepts model scripts written in either Python or R.

Note: Prior to adapting your model script for use with CANDLE, you must ensure it runs standalone on a Biowulf compute node. This can be tested by requesting an interactive GPU node (e.g., sinteractive --gres=gpu:k20x:1 --mem=60G --cpus-per-task=16) and then running the model like, e.g., python my_model_script.py or Rscript my_model_script.R; don’t forget to use the correct version of Python or R, if required!

Once you have confirmed that your model script runs as-is on Biowulf, modify it in two simple ways:

Step 1: Specify the hyperparameters

Specify the hyperparameters in your code using a variable named hyperparams of the dictionary (Python) or data.frame (R) datatypes. E.g., in Python, if your model script my_model_script.py contains

n_convolutional_layers = 4
batch_size = 128

but these are parameters that you'd like to change during the CANDLE workflow, you should change those lines to

n_convolutional_layers = hyperparams['nconv_layers']
batch_size = hyperparams['batch_size']

Note: The "key" in the hyperparams dictionary should match the variable names in the CANDLE input file (following section), whereas the variables to which they are assigned in the model script should obviously match the names used in the rest of the script.

Likewise, in R, if your model script my_model_script.R contains

n_convolutional_layers <- 4
batch_size <- 128

you should change those lines to

n_convolutional_layers <- hyperparams[["nconv_layers"]]
batch_size <- hyperparams[["batch_size"]]

Step 2: Define the metric you would like to minimize

If your model is written in Python, either define a Keras history object named history (as in, e.g., the return value of a model.fit() method; validation loss will be minimized), e.g.,

history = model.fit(x_train, y_train, validation_data=(x_val, y_val), ...)

or define a single number named val_to_return that contains the value you would like to minimize, e.g.,

score = model.evaluate(x_test, y_test)
val_to_return = score[0]

Note: (Assuming you have named your Keras model model:) If you are using as your minimization metric the return value from model.fit() (as opposed to using val_to_return), you must specify the validation_data keyword in the call to model.fit() as shown above. This way the history attribute of model.fit()'s return value will contain a key called val_loss, which is the metric that CANDLE will use to evaluate the current set of hyperparameters. (Choosing the best set of hyperparameters based on a holdout dataset such as a validation dataset is good practice anyway!) (If your model still doesn't seem to generate a val_loss key, try adding metrics=['accuracy'] in the call to model.compile().)

If your model is written in R, define a single number named val_to_return that contains the metric you would like to minimize, e.g.,

val_to_return <- my_validation_loss
Note on minimization metric

Only the bayesian workflow actually uses the minimization metric since by definition in order for it to determine the next sets of hyperparameters to try it needs a measure of how "well" prior sets of hyperparameters performed.  Since the grid workflow by definition runs training on all sets of hyperparameters regardless of any measure of how "well" prior sets performed, it never actually uses the minimization metric.  However, the val_to_return variable (or the history object in Python) is always required, so when running the grid workflow and you don't care to return any particular result from your model script, simply set it to a dummy value such as -7.

Typical physical values assigned to val_to_return include the training, testing, or validation loss (for a machine/deep learning model) or the workflow runtime (for optimizing workflow runtimes as in, e.g., benchmarking).

Creating the CANDLE Input File

In order to use CANDLE to run your own model script, you need to create an input file containing three sections:

  1. A &control section containing general settings
  2. A &default_model section containing the default hyperparameter values
  3. A &param_space section specifying the space of possible values of the hyperparameters

The input file must have a .in extension. A typical input file looks like:

&control
  model_script="$(pwd)/mnist_mlp.py"
  workflow="grid"
  ngpus=2
  gpu_type="k80"
  walltime="00:20:00"
/

&default_model
  epochs=20
  batch_size=128
  activation='relu'
  optimizer='rmsprop'
  num_filters=32
/

&param_space
  {"id": "hpset_01", "epochs": 15, "activation": "tanh"}
  {"id": "hpset_02", "epochs": 30, "activation": "tanh"}
  {"id": "hpset_03", "epochs": 15, "activation": "relu"}
  {"id": "hpset_04", "epochs": 30, "activation": "relu"}
  {"id": "hpset_05", "epochs": 10, "batch_size": 128}
  {"id": "hpset_06", "epochs": 10, "batch_size": 256}
  {"id": "hpset_07", "epochs": 10, "batch_size": 512}
/

Each section must be preceded by the section name (preceded with an ampersand symbol &) on a separate line and followed by a forward slash / on a separate line. The sections can appear in any order and their names must be one of control, default_model, or param_space.

The three sections of the input file are explained in more detail below. The first two (&control and &default_model) consist of settings of the format left-hand-side = right-hand-side. Spaces on either side of the equals sign = do not matter. The third section (&param_space) has a different format depending on whether the grid or bayesian workflows are specified by the workflow setting in the &control section.

In general, files should always use absolute paths, e.g., /path/to/myfile.ext instead of myfile.ext. In the &control section it is permissible to use $(pwd) as the path in order to use the directory from which candle submit-job <INPUT-FILE> is called, e.g., $(pwd)/myfile.ext.

As usual in programming languages, strings should be quoted (err on the side of double quotes "). Finally, whitespace preceding the section bodies does not have any effect aside from making the input file easier to read.

Tip: A useful way to remember the section names, format, and typical settings is to adapt any of the templates above (i.e., grid, bayesian, or r) to your use case. Feel free to run candle import-template <TEMPLATE> with different <TEMPLATE> settings and examine the input file that is copied over in order to better understand what it does and the types of settings it can contain.

Note: The old CANDLE usage, in which three input files are used instead of a single input file with three sections, is still supported. Just use candle submit-job <SUBMISSION-SCRIPT> (a Bash script) instead of candle submit-job <INPUT-FILE> (a text file with a .in extension). The candle program determines how to process the argument based on the file's extension. (In fact, this is how candle still works: it breaks up the input file into three files and runs the generated submission script using the old method.)

Section 1: &control

You only need to modify five settings in the &control section; the rest of the settings are optional. String settings in this section only can access Bash environment variables, e.g., model_script = ”/data/$USER/candle/mnist.py". Note: All settings in this section are converted in CANDLE to uppercase Bash variables, e.g., the value assigned to the model_script setting actually gets assigned to the Bash variable $MODEL_SCRIPT.

Required settings
model_script
This should point to the Python or R script that you would like to run. E.g., model_script = ”/data/$USER/candle/mnist.py". This script must have been adapted to work with CANDLE (see the previous section). The filename extension will automatically determine whether Python or R will be used to run the model.
workflow
Which CANDLE workflow to use. Currently supported are the grid and bayesian workflows. E.g., workflow = "grid".
ngpus
Number of GPUs you would like to use for the CANDLE job. E.g., ngpus = 2. Note: One (grid workflow) or two (bayesian workflow) extra GPUs will be allocated in order run background processes.
gpu_type
Type of GPU you would like to use. E.g., gpu_type = "k80". The choices on Biowulf are k20x, k80, p100, and v100.
walltime
How long you would like your job to run (the wall time of your entire job including all hyperparameter sets). E.g., walltime = "00:20:00". Format is HH:MM:SS. When in doubt, round up so that the job is most likely to complete (if it doesn't, use the restart_from_exp setting, below).
Optional variables

Python models only

python_bin_path
If you don’t want to use the Python version with which CANDLE was built (currently python/3.6), you can set this to the location of the Python binary you would like to use. Examples:
python_bin_path = "$CONDA_PREFIX/envs/<YOUR_CONDA_ENVIRONMENT_NAME>/bin"
python_bin_path = "/data/BIDS-HPC/public/software/conda/envs/main3.6/bin"
If set, it will override the setting of exec_python_module, below.
exec_python_module
If you’d prefer loading a module rather than specifying the path to the Python binary (above), set this to the name of the Python module you would like to load. E.g., exec_python_module = "python/2.7". This setting will have no effect if python_bin_path (above) is set. If neither python_bin_path nor exec_python_module is set, then the version of Python with which CANDLE was built (currently python/3.6) will be used.
supp_pythonpath
This is a supplementary setting of the $PYTHONPATH environment variable that will be searched for libraries that can’t otherwise be found. Examples:
supp_pythonpath = "/data/BIDS-HPC/public/software/conda/envs/main3.6/lib/python3.6/site-packages"
supp_pythonpath = "/data/$USER/conda/envs/my_conda_env/lib/python3.6/site-packages"

Tip: Multiple paths can be set by separating them with a colon.

dl_backend
Deep learning library to use. E.g., dl_backend = "pytorch". Should be either keras (default) or pytorch.

R models only

exec_r_module
If you don’t want to use the R version with which CANDLE was built (currently R/3.5.0), set this to the name of the R module you would like to load. E.g., exec_r_module = "R/3.6".
supp_r_libs
This is a supplementary setting of the $R_LIBS environment variable that will be searched for libraries that can’t otherwise be found. E.g., supp_r_libs = "/data/BIDS-HPC/public/software/R/3.6/library". Tip: R will search your standard library location on Biowulf (~/R/%v/library), so feel free to just install your own R libraries there.

Models written in either language

supp_modules
Modules you would like to have loaded while your model is run. E.g., supp_modules = "CUDA/10.0 cuDNN/7.5/CUDA-10.0" (these particular example settings may be necessary for running TensorFlow when using a custom Conda installation).
extra_script_args
Command-line arguments you’d like to include when invoking python or Rscript. E.g., for R model scripts, extra_script_args = "--max-ppsize=100000". In other words, the model will ultimately be run like python $EXTRA_SCRIPT_ARGS my_model_script.py or Rscript $EXTRA_SCRIPT_ARGS my_model_script.R.
use_candle
Whether to use CANDLE to run a workflow (1, default) or to simply run the model on the default set of hyperparameters specified in the &default_model section (0). E.g., use_candle = 0. If set to 0, use an interactive node (e.g., sinteractive --gres=gpu:k20x:1 --mem=60G --cpus-per-task=16); rather than candle submit-job <INPUT-FILE> (the standard way to run a CANDLE job) submitting a job to the batch queue, the job will run on the current node.
cpus_per_task
Number of CPUs to request that SLURM allocate per GPU (i.e., MPI process). E.g., cpus_per_task = 4. By default CANDLE requests the number of CPUs (and memory) proportional to the total number of GPUs on the node, depending on the required gpu_type setting. The default values of cpus_per_task and mem_per_node (see the following setting) are:
gpu_type
cpus_per_task
mem_per_node
k20x
16
60G
k80 14 60G
p100
14 30G
v100
14 30G

mem_per_node
Amount of memory (including the units) to request that SLURM allocate per GPU (i.e., MPI process). E.g., mem_per_node = "10G". By default CANDLE requests the amount of memory (and CPUs) proportional to the total number of GPUs on the node, depending on the required gpu_type setting. The default values of mem_per_node and cpus_per_task (see the previous setting) are as in the table above.
restart_from_exp
(Experimental) If a grid workflow was run previously but for whatever reason did not complete (such as a too-low setting of walltime), here you can specify the name of the experiment from which to resume. E.g., restart_from_exp = "X002".

bayesian workflow only

design_size
Total number of points to sample within the hyperparameter space prior to running the mlrMBO algorithm. E.g., design_size = 9 (default 10). Note that this must be greater than or equal to the largest number of possible values for any discrete hyperparameter specified in the &param_space section. A reasonable value for this (and for propose_points, below) is 15-20.
propose_points
Number of proposed (really evaluated) points at each MBO iteration. E.g., propose_points = 9 (default 10). A reasonable value for this (and for design_size, above) is 15-20.
max_budget
Maximum total number of function evaluations for all iterations combined. E.g., max_budget = 180 (default 110).
max_iterations
Maximum number of sequential optimization steps. E.g., max_iterations = 3 (default 10).
Note: When use_candle = 1 (default behavior), the submission script (which should never be called using sbatch) will automatically request an sbatch job. When use_candle = 0, the submission script should be called the the same way (candle submit-job <INPUT-FILE>), but from an interactive (compute) node; this is the best way to test your job (with the default hyperparameter settings) without actually running a CANDLE workflow.

Section 2: &default_model

This section contains the default settings of the hyperparameters defined in the model script, some or all of whose values will be overwritten by those specified in the &param_space section, below. Every hyperparameter specified in the model script must have a default setting specified here.

This section should be otherwise self-explanatory from the sample &default_model section above.

Tip: This is a great place to define constants in your model script (such as a URL from where the training data should be downloaded), rather than hardcoding them in to the model script. E.g., you can replace the line

DATA_URL = 'http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/combo/'

in your Python model script with

DATA_URL = hyperparams['data_url']

and place the line

data_url = 'http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/combo/'

in the &default_model section of the CANDLE input file. This way, all settings can be changed from a single input file.

Section 3: &param_space

This section contains how some or all of the hyperparameters defined in the model script (and the &default_model section) are to be varied during a hyperparameter optimization workflow.

grid workflow

The grid workflow refers to a "grid search" hyperparameter optimization in which generally the hyperparameters are varied evenly throughout a specified parameter space.

In the &param_space section for this workflow, each line must be a JSON string specifying the values of the hyperparameters to use in each job, and each string must contain an id key containing a unique name for the hyperparameter set, e.g.:

{"id": "hpset_01", "epochs": 15, "activation": "tanh"}
{"id": "hpset_02", "epochs": 30, "activation": "tanh"}
{"id": "hpset_03", "epochs": 15, "activation": "relu"}
{"id": "hpset_04", "epochs": 30, "activation": "relu"}
{"id": "hpset_05", "epochs": 10, "batch_size": 128}
{"id": "hpset_06", "epochs": 10, "batch_size": 256}
{"id": "hpset_07", "epochs": 10, "batch_size": 512}

Note: This example implies that the epochs, activation, and batch_size hyperparameters must be defined in the &default_model section. It further shows that the full "grid" of values need not be run in the grid workflow; in fact, you can customize by hand every set of hyperparameter values that you'd like to run.

Note: Python’s False, True, and None, should be replaced by JSON’s false, true, and null in the &param_space section for the grid workflow.

Alternatively, you can use the generate-grid candle command to create a file called grid_workflow-XXXX.txt containing a full "grid" of hyperparameters. The usage is candle generate-grid <PYTHON-LIST-1> <PYTHON-LIST-2> ..., where each <PYTHON-LIST> is a Python list whose first element is a string containing the hyperparameter name and the second argument is an iterable of hyperparameter values (numpy functions can be accessed using the np variable). For example, running

[user@biowulf]$ candle generate-grid "['nlayers',np.arange(5,15,2)]" "['dir',['x','y','z']]"

will create a file called grid_workflow-XXXX.txt with the contents

{"id": "hpset_00001", "nlayers": 5, "dir": "x"}
{"id": "hpset_00002", "nlayers": 5, "dir": "y"}
{"id": "hpset_00003", "nlayers": 5, "dir": "z"}
{"id": "hpset_00004", "nlayers": 7, "dir": "x"}
{"id": "hpset_00005", "nlayers": 7, "dir": "y"}
{"id": "hpset_00006", "nlayers": 7, "dir": "z"}
{"id": "hpset_00007", "nlayers": 9, "dir": "x"}
{"id": "hpset_00008", "nlayers": 9, "dir": "y"}
{"id": "hpset_00009", "nlayers": 9, "dir": "z"}
{"id": "hpset_00010", "nlayers": 11, "dir": "x"}
{"id": "hpset_00011", "nlayers": 11, "dir": "y"}
{"id": "hpset_00012", "nlayers": 11, "dir": "z"}
{"id": "hpset_00013", "nlayers": 13, "dir": "x"}
{"id": "hpset_00014", "nlayers": 13, "dir": "y"}
{"id": "hpset_00015", "nlayers": 13, "dir": "z"}

Note: The candle module must be loaded in order to run any of the candle commands such as generate-grid.

The contents of the file grid_workflow-XXXX.txt should then be placed in the body of the &param_space section of the input file.

A more complete example producing a 600-line file (600 sets of hyperparameters) is

[user@biowulf]$ candle generate-grid "['john',np.arange(5,15,2)]" "['single_num',[4]]" "['letter',['x','y','z']]" "['arr',[[2,2],None,[2,2,2],[2,2,2,2]]]" "['smith',np.arange(-1,1,0.2)]"

No spaces can be present in any of the arguments to the generate-grid command.

Note: Use Python’s False, True, and None if using the generate-grid command; the output in grid_workflow-XXXX.txt will replace these with JSON’s false, true, and null, respectively.

bayesian workflow

The bayesian workflow refers to a Bayesian-based hyperparameter optimization in which information about how well prior sets of hyperparameters performed is used to determine the next sets of hyperparameters to try. In this way the HPO algorithm does not sample the full space of hyperparameter values and instead iteratively homes in on the best set of hyperparameters. Compared to a full grid search, this can save significant time when the hyperparameter space is large and the model takes a long time to run on the training data. One drawback is that it is more difficult to observe exactly how each hyperparameter or hyperparameter combination directly affects the model's performance.

The Bayesian algorithm used in CANDLE is an R package called mlrMBO. Briefly, after the (hyper)parameter space has been defined, the algorithm chooses design_size evenly-spaced points throughout the space and runs the model on those design_size sets of hyperparameters. A random forest model (called a "surrogate model") then fits the hyperparameters run to their resulting performance metrics (specified either by the val_to_return variable or the history variable as explained above) and produces propose_points new sets of hyperparameters it believes may minimize the metric. The model is then run on these new sets of hyperparameters, after which the algorithm incorporates these hyperparameters and their resulting performance metrics into the surrogate model and then proposes propose_points new sets of hyperparameters to try within the defined parameter space. This process is repeated until convergence to the "best" set of hyperparameters or if max_iterations iterations have been run or max_budget total model runs have been performed.

For more details, please see the mlrMBO package documentation.

The &param_space section for the bayesian workflow is based on the makeParamSet function in the ParamHelpers R package. Each line in this section is what would be an argument to makeParamSet() (without the commas separating the arguments); the formatting for this section should be based off this argument format. It is relatively intuitive to understand; e.g., here is the &param_space section in the bayesian template input file:

makeDiscreteParam("batch_size", values = c(16, 32))
makeIntegerParam("epochs", lower = 2, upper = 5)
makeDiscreteParam("optimizer", values = c("adam", "sgd", "rmsprop", "adagrad", "adadelta"))
makeNumericParam("drop", lower = 0, upper = 0.9)
makeNumericParam("learning_rate", lower = 0.00001, upper = 0.1)

This defines the possible values that the hyperparameters batch_size, epochs, optimizer, drop, and learning_rate can take on during the running of the bayesian workflow. Please see the Param help page for individual usage of each type of constructor function.

Aggregating CANDLE Job Results

After a CANDLE job is complete, the results of all jobs run on each set of hyperparameters will be placed in a subdirectory of the experiments directory, which will be created in the directory from which the job was submitted. A symbolic link called last-exp in the same level as the experiments directory will point to the last experiment that was run.

Inside one of the experiments subdirectories will be the run directory, which will contain one subdirectory per hyperparameter set containing the results of the model script run using that hyperparameter set. In each of these subdirectories will be a file called subprocess_out_and_err.txt that will contain the model's raw output (i.e., what you'd expect to be printed to the terminal if you ran the model completely outside of CANDLE). If the model ran successfully using that hyperparameter set, a file called result.txt will also be present containing the value specified by val_to_return (or history).

Tip: If your CANDLE job dies, looking inside the subprocess_out_and_err.txt files will generally indicate why.

For example, here is a sample directory structure expanding one of the CANDLE experiments directories (X002):

.
├── experiments
│   ├── X000
│   ├── X001
│   └── X002
│       ├── cfg-sys-biowulf.sh
│       ├── grid_workflow-mnist.txt
│       ├── jobid.txt
│       ├── metadata.json
│       ├── output.txt
│       ├── run
│       │   ├── hpset_01
│       │   ├── hpset_02
│       │   ├── hpset_03
│       │   ├── hpset_04
│       │   ├── hpset_05
│       │   ├── hpset_06
│       │   └── hpset_07
│       ├── submit.sh
│       ├── turbine.log
│       ├── turbine-slurm.sh
│       ├── workflow.sh.log
│       └── workflow.tic
├── last-exp -> /data/doeja/candle/experiments/X002
└── submit_candle_job.sh

In order to collect the values of all the hyperparameter sets as well as the resulting metric for each set, run the aggregate-results command to candle:

[user@biowulf]$ candle aggregate-results <EXP-DIR> [<RESULT-FORMAT>]

where <EXP-DIR> is the experiment directory, i.e., that containing the run directory, and <RESULT-FORMAT> is an optional string containing the standard printf()-formatted string containing the output format for the metric. For example, if the r template/example were run inside the /data/$USER/candle directory, then running

[user@biowulf]$ candle aggregate-results /data/$USER/candle/last-exp

would produce a file called candle_results.csv in the current directory containing the data from all the jobs, sorted by increasing metric value, e.g.,

result,dirname,id,mincorr,maxcorr,number_cv,extfolds
000.796,hpset_00001,hpset_00001,0.200000,0.80,2,5
000.796,hpset_00004,hpset_00004,0.200000,0.80,5,5
000.837,hpset_00002,hpset_00002,0.200000,0.80,3,5
000.878,hpset_00003,hpset_00003,0.200000,0.80,4,5
000.905,hpset_00007,hpset_00007,0.200000,0.80,8,5
000.964,hpset_00005,hpset_00005,0.200000,0.80,6,5
000.964,hpset_00006,hpset_00006,0.200000,0.80,7,5
001.000,hpset_00008,hpset_00008,0.200000,0.80,9,5

This file can be further processed using Excel or any other method in order to study the results of the HPO.

Note: Since the field names (first line above) are extracted once, if the hyperparameters that are modified are not the same for every set of hyperparameters run using the grid workflow, then the results of the aggregate-results command will not make sense. For example, running this command on the results of the grid template/example will produce

result,dirname,id,epochs,activation
000.064,hpset_01,hpset_01,15,tanh
000.066,hpset_07,hpset_07,10,512
000.074,hpset_06,hpset_06,10,256
000.080,hpset_02,hpset_02,30,tanh
000.081,hpset_05,hpset_05,10,128
000.098,hpset_03,hpset_03,15,relu
000.121,hpset_04,hpset_04,30,relu

As usual, the full pathname must be used for <EXP-DIR>. Tip: Use $(pwd) to automatically include the full path in front of a relative file, e.g., candle aggregate-results $(pwd)/last-exp.

Summary of candle Commands

As long as the candle module is loaded (module load candle), the available commands to the candle program (in the format candle <COMMAND> <COMMAND-ARG-1> <COMMAND-ARG-2> ...) are as follows:

candle import-template <TEMPLATE> Copy a CANDLE template to the current directory
candle generate-grid <PYTHON-LIST-1> <PYTHON-LIST-2> ... Generate a hyperparameter grid for the grid search workflow
candle submit-job <INPUT-FILE> Submit a CANDLE job
candle aggregate-results <EXP-DIR> [<RESULT-FORMAT>] Create a CSV file called candle_results.csv containing the hyperparameters and corresponding performance metrics

Tip: Leaving <COMMAND> blank or setting it to help will display this usage menu.

Promoting CANDLE and Your Work

If you've successfully used CANDLE to advance your work and you're willing to tell us about it, please email the SDSI team to tell us what you've done! We'd love to learn how users are using CANDLE to address their needs so that we can continue to improve CANDLE and its implementation on Biowulf.

Further, if you're willing to have your work promoted online, please include an exemplary graphic of your work, and upon review we'll post it here as an exemplar CANDLE success story.  More exposure for you, more exposure for us!

Or, if you've unsuccessfully used CANDLE to advance your work, we'd love to help you out; please let us know what didn't work for you!

Contact Information

Feel free to email the SDSI team with any questions, comments, or suggestions.

For notices, links, and updates, please go to https://cbiit.github.com/sdsi/candle.

Finally, our team has expertise in building machine/deep learning models for a variety of situations (e.g., image segmentation, classification from RNA-Seq data, etc.) and would be happy to help you build a model (independent of CANDLE) or point you in the right direction. (And, we are happy to collaborate!)