Biowulf High Performance Computing at the NIH
deepmedic on Biowulf

This project aims to offer easy access to Deep Learning for segmentation of structures of interest in biomedical 3D scans. It is a system that allows the easy creation of a 3D Convolutional Neural Network, which can be trained to detect and segment structures if corresponding ground truth labels are provided for training. The system processes NIFTI images, making its use straightforward for many biomedical tasks.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem 10g --gres=gpu:k20x:1
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load deepmedic

[user@cn3144 ~]$ deepMedicRun -h
usage: DeepMedic [-h] [-model MODEL_CFG] [-train TRAIN_CFG] [-test TEST_CFG]
                 [-load SAVED_MODEL] [-dev DEVICE] [-resetopt]

This software allows creation and supervised training of 3D, multi-scale CNN models for segmentation of structures in biomedical NIFTI volumes.
The project is hosted at: https://github.com/Kamnitsask/deepmedic 
See the documentation for details on its use.
This software accompanies the research presented in:
Kamnitsas et al, "Efficient Multi-Scale 3D CNN with Fully Connected CRF for Accurate Brain Lesion Segmentation", Biomedical Image Analysis, 2016.
We hope our work aids you in your endeavours.
For questions and feedback contact: konstantinos.kamnitsas12@ic.ac.uk

optional arguments:
  -h, --help         show this help message and exit
  -model MODEL_CFG   Specify the architecture of the model to be used, by providing a config file [MODEL_CFG].
  -train TRAIN_CFG   Train a model with training parameters given by specifying config file [TRAINING_CFG].
                     This option must follow a [-model MODEL_CFG] option, so that architecture of the to-train model is specified.
                     Additionally, an existing checkpoint of the model can be specified in the [TRAIN_CFG] file or by the additional option [-load], to continue training it.
  -test TEST_CFG     Test with an existing model. The testing session's parameters should be given in config file [TEST_CFG].
                     This option must follow a [-model MODEL_CFG] option, so that architecture of the model is specified.
                     Existing pretrained model can be specified in the given [TEST_CFG] file or by the additional option [-load].
                     This option cannot be used in combination with [-model] or [-train].
  -load SAVED_MODEL  The path to a saved existing checkpoint with learnt weights of the model, to train or test with.
                     This option must follow a [-train] or [-test] option.
                     If given, this option will override any "model" parameters given in the [TRAIN_CFG] or [TEST_CFG] files.
  -dev DEVICE        Specify the device to run the process on. Values: [cpu] or [cuda] (default = cpu).
                     In the case of multiple GPUs, specify a particular GPU device with a number, in the format: -dev cuda0 
                     NOTE: For GPU processing, CUDA libraries must be first added in your environment's PATH and LD_LIBRARY_PATH. See accompanying documentation.
  -resetopt          Use optionally with a [-train] command. Does not take an argument.
                     Usage: ./deepMedicRun -model /path/to/model/config -train /path/to/train/config -resetopt ...etc...
                     Resets the model's optimization state before starting the training session (eg number of epochs already trained, current learning rate etc).
                     IMPORTANT: Trainable parameters are NOT reinitialized! 
                     Useful to begin a secondary training session with new learning-rate schedule, in order to fine-tune a previously trained model (Doc., Sec. 3.2)

[user@cn3144 ~]$ cp -r /usr/local/apps/deepmedic/TEST_DATA .

[user@cn3144 ~]$ cd TEST_DATA
NOTE: To avoid using another GPU, please do not modify the "-dev cuda${CUDA_VISIBLE_DEVICES}" option.

[user@cn3144 ~]$ deepMedicRun -model configFiles/tinyCnn/model/modelConfig.cfg -train configFiles/tinyCnn/train/trainConfigWithValidation.cfg -dev cuda${CUDA_VISIBLE_DEVICES}
Given configuration file:  /spin1/scratch/teacher/TEST_DATA/configFiles/tinyCnn/model/modelConfig.cfg
Given configuration file:  /spin1/scratch/teacher/TEST_DATA/configFiles/tinyCnn/train/trainConfigWithValidation.cfg
Creating necessary folders for training session...
=============================== logger created =======================================

======================== Starting new session ============================
Command line arguments given: 
Namespace(device='cuda1', model_cfg='configFiles/tinyCnn/model/modelConfig.cfg', reset_trainer=False, saved_model=None, test_cfg=None, train_cfg='configFiles/tinyCnn/train/trainConfigWithValidation.cfg')
2019-02-21 11:38:01.861666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Tesla K20Xm major: 3 minor: 5 memoryClockRate(GHz): 0.732
pciBusID: 0000:27:00.0
totalMemory: 5.94GiB freeMemory: 5.87GiB
2019-02-21 11:38:01.861775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-21 11:38:02.340802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-21 11:38:02.340880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-02-21 11:38:02.340908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-02-21 11:38:02.341239: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 5662 MB memory) -> physical GPU (device: 0, name: Tesla K20Xm, pci bus id: 0000:27:00.0, compute capability: 3.5)
[...]
EXPLANATION: DICE1/2/3 are lists with the DICE per class. For Class-0, we calculate DICE for whole foreground, i.e all labels merged, except the background label=0. Useful for multi-class problems.
EXPLANATION: DICE1 is calculated as segmentation over whole volume VS whole Ground Truth (GT). DICE2 is the segmentation within the ROI vs GT. DICE3 is segmentation within the ROI vs the GT within the ROI.
EXPLANATION: If an ROI mask has been provided, you should be consulting DICE2 or DICE3.
+++++++++++++++++++++++++++++++ Segmentation of all subjects finished +++++++++++++++++++++++++++++++++++
+++++++++++++++++++++ Reporting Average Segmentation Metrics over all subjects ++++++++++++++++++++++++++
ACCURACY: (Validation) The Per-Class average DICE Coefficients over all subjects are: DICE1=[ 0.9083 0.2183 0.6490 0.0000 0.4908 ] DICE2=[ 0.9084 0.2183 0.6490 0.0000 0.4908 ] DICE3=[ 0.9086 0.2183 0.6492 0.0000 0.4908 ]
EXPLANATION: DICE1/2/3 are lists with the DICE per class. For Class-0, we calculate DICE for whole foreground, i.e all labels merged, except the background label=0. Useful for multi-class problems.
EXPLANATION: DICE1 is calculated as segmentation over whole volume VS whole Ground Truth (GT). DICE2 is the segmentation within the ROI vs GT. DICE3 is segmentation within the ROI vs the GT within the ROI.
EXPLANATION: If an ROI mask has been provided, you should be consulting DICE2 or DICE3.
TIMING: Validation process lasted: 13.45 secs.
###########################################################################################################
############################# Finished full Segmentation of Validation subjects ##########################
###########################################################################################################
TIMING: Training process lasted: 93.4 secs.
Closing worker pool.
Saving the final model at:/spin1/scratch/teacher/TEST_DATA/output/saved_models//trainSessionWithValidTiny//tinyCnn.trainSessionWithValidTiny.final.2019-02-21.11.40.09.341982
The whole do_training() function has finished.

=======================================================
=========== Training session finished =================
=======================================================
Finished.

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. deepmedic.sh). For example:

#!/bin/bash
module load deepmedic
deepMedicRun -model configFiles/tinyCnn/model/modelConfig.cfg -train configFiles/tinyCnn/train/trainConfigWithValidation.cfg -dev cuda${CUDA_VISIBLE_DEVICES}

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=8 --mem=10g --partition=gpu --gres=gpu:k20x:1 deepmedic.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. deepmedic.swarm). For example:

deepMedicRun -model model1.cfg -train train1.cfg -dev cuda${CUDA_VISIBLE_DEVICES}
deepMedicRun -model model2.cfg -train train2.cfg -dev cuda${CUDA_VISIBLE_DEVICES}
deepMedicRun -model model3.cfg -train train3.cfg -dev cuda${CUDA_VISIBLE_DEVICES}
Note: Specify different output paths in the training configuration to avoid output files being overwritten.

Submit this job using the swarm command.

swarm -f deepmedic.swarm -g 10 -t 8 --partition=gpu --gres=gpu:k20x:1 --module deepmedic
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module deepmedic Loads the deepmedic module for each subjob in the swarm