Biowulf High Performance Computing at the NIH
U-Net: a convolutional network for biomedical image segmentation

U-Net is the winner of the ISBI bioimage segmentation challenge 2015. It relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=16g --gres=gpu:v100,lscratch:10 -c14
[user@@cn3107 ~]$module load U-Net 
View available command line options for training the Keras implementation of U-Net:
[user@@cn3107 ~]$ cp -r $UNET_KERAS/* .
[user@@cn3107 ~]$ python train.py -h 
...
Usage:
    train.py python main_multi_GPUs.py [options (-h to list)]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -b batch_size, --batch_size=batch_size
                        batch size
  -e num_epochs, --num_epochs=num_epochs
                        number of epochs
  -g num_gpus, --num_gpus=num_gpus
                        number of gpus to use
  -n num_images, --num_images=num_images
                        number of images to generate during augmentation
  -v, --verbose         increase the verbosity level of output
Train the network on 1024 augmented images (generated from the original 24 images), using 50 epochs:
[user@@cn3107 ~]$ python train.py -e 50 -n 1024 
1/512 [..............................] - ETA: 50:36 - loss: 0.6876 - acc: 0.83  
2/512 [..............................] - ETA: 25:35 - loss: 0.6723 - acc: 0.86  
3/512 [..............................] - ETA: 17:15 - loss: 0.6552 - acc: 0.87  
4/512 [..............................] - ETA: 13:04 - loss: 0.6282 - acc: 0.88  
5/512 [..............................] - ETA: 10:34 - loss: 0.5808 - acc: 0.88  
6/512 [..............................] - ETA: 8:54 - loss: 0.5583 - acc: 0.890  
7/512 [..............................] - ETA: 7:42 - loss: 0.5341 - acc: 0.891 
... 
... 
... saving model to MODEL.hdf5
The trainig results will be saved in the file MODEL.hdf5.

Make predictions (i.e. binary segmentation) based on the trained model:
[user@@cn3107 ~]$ python predict.py 
...
24/24 [==============================] - 4s 165ms/step
..
The predictions will be stored in the folder data/membrane/test.

Now check the usage of the PyTorch implementation of the U-Net:
[user@@cn3107 ~]$rm -rf * 
[user@@cn3107 ~]$cp -r $UNET_PT/* .
[user@@cn3107 ~]$python train.py -h 
Usage: train.py [options]

Options:
  -h, --help            show this help message and exit
  -e EPOCHS, --epochs=EPOCHS
                        number of epochs
  -b BATCHSIZE, --batch-size=BATCHSIZE
                        batch size
  -l LR, --learning-rate=LR
                        learning rate
  -g, --gpu             use cuda
  -c LOAD, --load=LOAD  load file model
  -s SCALE, --scale=SCALE
                        downscaling factor of the images
Train the PyTorch implementation on the GPU, using 100 epochs and learning rate 0.001:
[user@@cn3107 ~]$python train.py -g -e 5 -l 0.0001
0.0000 --- loss: 0.793130
0.4348 --- loss: 0.641810
0.8696 --- loss: 0.489400
Epoch finished ! Loss: 0.9621700197458267
Checkpoint 1 saved !
Starting epoch 2/2.
0.0000 --- loss: 0.415636
0.4348 --- loss: 0.380256
0.8696 --- loss: 0.295290
Epoch finished ! Loss: 0.5455908477306366
Checkpoint 2 saved !
...
Epoch finished ! Loss: 0.06540998723357916
Checkpoint 100 saved !
Save the checkpoint corresponding to the smallest loss as the final model, then perform a prediction:
[user@@cn3107 ~]$cp checkpoints/CP89.pth MODEL.pth 
[user@@cn3107 ~]$python predict.py -i data/train/000016.jpg -o ./000016.jpg -m MODEL.pth -r 
[user@@cn3107 ~]$
End the interactive session:
[user@cnR3316 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. unet.sh). For example:

#!/bin/bash
module load U-Net 
cp -r $UNET_KERAS/* .
python train.py -g  -e 100 -l 0.0001

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] unet.sh