Topaz is application for particle detection in cryo-electron microscopy. Topaz uses convolutional neural networks trained from positive and unlabeled examples. Topaz can also do denoising of micrographs and tomograms..
${TOPAZ_TEST_DATA}
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --gres=gpu:k80:1,lscratch:50 --cpus-per-task=8 --mem=4g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn4224 are ready for job
[user@cn4224 ~]$ cd /lscratch/$SLURM_JOB_ID
[user@cn4224 ~]$ module load topaz/0.2.4
[user@cn4224 ~]$ cp ${TOPAZ_TEST_DATA}/topaz-tutorial-data.tar.gz .
[user@cn4224 ~]$ tar -xzf topaz-tutorial-data.tar.gz
Preprocessing step:
[user@cn4224 ~]$ mkdir -p data/EMPIAR-10025/processed
[user@cn4224 ~]$ mkdir -p data/EMPIAR-10025/processed/micrographs
[user@cn4224 ~]$ topaz preprocess -d 0 -v -s 8 -o \
data/EMPIAR-10025/processed/micrographs/ \
data/EMPIAR-10025/rawdata/micrographs/*.mrc
# processed: 14sep05c_c_00003gr_00014sq_00004hl_00004es_c
# processed: 14sep05c_c_00003gr_00014sq_00005hl_00003es_c
# processed: 14sep05c_c_00003gr_00014sq_00007hl_00004es_c
# processed: 14sep05c_c_00003gr_00014sq_00011hl_00003es_c
# processed: 14sep05c_c_00003gr_00015sq_00015hl_00002es_c
# processed: 14sep05c_c_00003gr_00018sq_00008hl_00003es_c
# processed: 14sep05c_c_00003gr_00018sq_00010hl_00005es_c
# processed: 14sep05c_c_00003gr_00020sq_00011hl_00002es_c
# processed: 14sep05c_c_00003gr_00020sq_00011hl_00004es_c
# processed: 14sep05c_c_00004gr_00031sq_00002hl_00002es_c
# processed: 14sep05c_c_00004gr_00031sq_00005hl_00002es_c
# processed: 14sep05c_c_00004gr_00031sq_00010hl_00002es_c
# processed: 14sep05c_c_00004gr_00032sq_00007hl_00003es_c
# processed: 14sep05c_c_00004gr_00032sq_00010hl_00003es_c
# processed: 14sep05c_c_00004gr_00032sq_00029hl_00005es_c
# processed: 14sep05c_c_00004gr_00032sq_00031hl_00002es_c
# processed: 14sep05c_c_00004gr_00032sq_00033hl_00005es_c
# processed: 14sep05c_c_00004gr_00032sq_00037hl_00002es_c
# processed: 14sep05c_c_00004gr_00032sq_00037hl_00003es_c
# processed: 14sep05c_c_00004gr_00032sq_00040hl_00002es_c
# processed: 14sep05c_c_00004gr_00032sq_00040hl_00004es_c
# processed: 14sep05c_c_00004gr_00032sq_00041hl_00005es_c
# processed: 14sep05c_c_00007gr_00013sq_00004hl_00003es_c
# processed: 14sep05c_c_00007gr_00013sq_00005hl_00002es_c
# processed: 14sep05c_c_00007gr_00013sq_00006hl_00002es_c
# processed: 14sep05c_c_00007gr_00013sq_00008hl_00003es_c
# processed: 14sep05c_c_00007gr_00013sq_00008hl_00004es_c
# processed: 14sep05c_c_00007gr_00013sq_00009hl_00002es_c
# processed: 14sep05c_c_00007gr_00013sq_00009hl_00004es_c
# processed: 14sep05c_c_00007gr_00013sq_00014hl_00004es_c
[user@cn4224 ~] topaz convert -s 8 -o \
data/EMPIAR-10025/processed/particles.txt \
data/EMPIAR-10025/rawdata/particles.txt
Training step:
[user@cn4224 ~]$ mkdir -p saved_models
[user@cn4224 ~]$ mkdir -p saved_models/EMPIAR-10025
[user@cn4224 ~]$ topaz train -n 400 \
--num-workers=8 \
--train-images data/EMPIAR-10025/processed/micrographs/ \
--train-targets data/EMPIAR-10025/processed/particles.txt \
--save-prefix=saved_models/EMPIAR-10025/model \
-o saved_models/EMPIAR-10025/model_training.txt
# Loading model: resnet8
# Model parameters: units=32, dropout=0.0, bn=on
# Loading pretrained model: resnet8_u32
# Receptive field: 71
# Using device=0 with cuda=True
# Loaded 30 training micrographs with 1500 labeled particles
# source split p_observed num_positive_regions total_regions
# 0 train 0.00163 43500 26669790
# Specified expected number of particle per micrograph = 400.0
# With radius = 3
# Setting pi = 0.0130484716977524
# minibatch_size=256, epoch_size=1000, num_epochs=10
# Done!
Extraction step:
[user@cn4224 ~]$ mkdir -p data/EMPIAR-10025/topaz
[user@cn4224 ~]$ topaz extract -r 14 -x 8 -m \
saved_models/EMPIAR-10025/model_epoch10.sav \
-o data/EMPIAR-10025/topaz/predicted_particles_all_upsampled.txt \
data/EMPIAR-10025/processed/micrographs/*.mrc
[user@cn4224 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Create a batch input file (e.g. topaz.sh) similar to the following.
#! /bin/bash set -e module load topaz topaz preprocess -d 0 -v -s 8 -o \ data/EMPIAR-10025/processed/micrographs/ \ data/EMPIAR-10025/rawdata/micrographs/*.mrc topaz convert -s 8 -o \ data/EMPIAR-10025/processed/particles.txt \ data/EMPIAR-10025/rawdata/particles.txt topaz train -n 400 \ --num-workers=8 \ --train-images data/EMPIAR-10025/processed/micrographs/ \ --train-targets data/EMPIAR-10025/processed/particles.txt \ --save-prefix=saved_models/EMPIAR-10025/model \ -o saved_models/EMPIAR-10025/model_training.txt topaz extract -r 14 -x 8 -m \ saved_models/EMPIAR-10025/model_epoch10.sav \ -o data/EMPIAR-10025/topaz/predicted_particles_all_upsampled.txt \ data/EMPIAR-10025/processed/micrographs/*.mrc
Submit these jobs using the Slurm sbatch command.
Create a swarmfile for the first step of the pipeline (e.g. topaz.swarm). For example:
topaz preprocess -d 0 -v -s 8 -o \ dataset1/processed/micrographs/ \ dataset1/rawdata/micrographs/*.mrc topaz preprocess -d 0 -v -s 8 -o \ dataset2/processed/micrographs/ \ dataset2/rawdata/micrographs/*.mrc topaz preprocess -d 0 -v -s 8 -o \ dataset3/processed/micrographs/ \ dataset3/rawdata/micrographs/*.mrc
Submit this job using the swarm command.
swarm -f topaz.swarm [-g #] --module topazwhere
| -g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
| --module topaz | Loads the topaz module for each subjob in the swarm |