Biowulf High Performance Computing at the NIH
meka on Biowulf

The MEKA project provides an open source implementation of methods for multi-label learning and evaluation.

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --cpus-per-task=2 --mem=4g --gres=lscratch:10
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ module load meka
[user@cn3144]$ # Copy the test data
[user@cn3144]$ cp -r $MEKA_TEST_DATA .
[user@cn3144]$ # run a binary relevance/naive bayes classifier on the music data
[user@cn3144]$ java -cp "$MEKA_JARS" meka.classifiers.multilabel.BR \
             -t  data/Music.arff -x 10 -R -W weka.classifiers.bayes.NaiveBayes

== Evaluation Info

Classifier                     meka.classifiers.multilabel.BR
Options                        [-W, weka.classifiers.bayes.NaiveBayes]
Additional Info
Dataset                        Music
Number of labels (L)           6
Type                           ML-CV
Threshold                      0.5
Verbosity                      1


== Predictive Performance

Number of test instances (N)
Accuracy                       0.529
Jaccard index                  0.529
Hamming score                  0.748
Exact match                    0.206


== Additional Measurements (averaged across folds)

Number of training instances   592
Number of test instances       592
Label cardinality (train set)  1.87
Label cardinality (test set)   1.87
Build Time                     0.075
Test Time                      0.023
Total Time                     0.098

[user@cn3144]$ # 'meka' is a small wrapper that automatically defines the java classpath
[user@cn3144]$ meka meka.classifiers.multilabel.BR \
             -t  data/Music.arff -x 10 -R -W weka.classifiers.bayes.NaiveBayes
[user@cn3144]$

[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$

meka also includes a GUI application. In order to use the GUI, connect to biowulf with either the NX player or using X11 forwarding through ssh. Then start the sinteractive session and start the GUI with run.sh or meka-gui.

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. meka.sh) similar to the following example:

#! /bin/bash
# this file is meka.batch

module load meka/1.9.1 || exit 1

cp -r $MEKA_TEST_DATA .
echo "Incremental Classification: Ensembles of Binary Relevance on enron data"
meka meka.classifiers.multilabel.incremental.meta.BaggingMLUpdateable\
    -x 20 -t data/Enron.arff \
    -W meka.classifiers.multilabel.incremental.BRUpdateable -- \
    -W weka.classifiers.bayes.NaiveBayesUpdateable

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=2 --mem=4g meka.sh