High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
meka on Biowulf & Helix

Description

A multilable/multitarget extension to WEKA

There may be multiple versions of meka available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail meka 

To select a module use

module load meka/[version]

where [version] is the version of choice.

Environment variables set

Documentation

Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described below

biowulf$ sinteractive --cpus-per-task=2 --mem=4g
salloc.exe: Pending job allocation 38978697
[...snip...]
salloc.exe: Nodes cn2273 are ready for job
node$ module load meka/1.9.1
[+] Loading meka 1.9.1
node$ # Copy the test data
node$ cp -r $MEKA_TEST_DATA .
node$ # run a binary relevance/naive bayes classifier on the music data
node$ java -cp "$MEKA_JARS" meka.classifiers.multilabel.BR \
             -t  data/Music.arff -x 10 -R -W weka.classifiers.bayes.NaiveBayes

== Evaluation Info

Classifier                     meka.classifiers.multilabel.BR
Options                        [-W, weka.classifiers.bayes.NaiveBayes]
Additional Info
Dataset                        Music
Number of labels (L)           6
Type                           ML-CV
Threshold                      0.5
Verbosity                      1


== Predictive Performance

Number of test instances (N)
Accuracy                       0.529
Jaccard index                  0.529
Hamming score                  0.748
Exact match                    0.206


== Additional Measurements (averaged across folds)

Number of training instances   592
Number of test instances       592
Label cardinality (train set)  1.87
Label cardinality (test set)   1.87
Build Time                     0.075
Test Time                      0.023
Total Time                     0.098

node$ # 'meka' is a small wrapper that automatically defines the java classpath
node$ meka meka.classifiers.multilabel.BR \
             -t  data/Music.arff -x 10 -R -W weka.classifiers.bayes.NaiveBayes
[...snip...]
node$ exit
biowulf$

meka also includes a GUI application. In order to use the GUI, connect to biowulf with either the NX player or using X11 forwarding through ssh. Then start the sinteractive session.

Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is meka.batch

module load meka/1.9.1 || exit 1

cp -r $MEKA_TEST_DATA .
echo "Incremental Classification: Ensembles of Binary Relevance on enron data"
meka meka.classifiers.multilabel.incremental.meta.BaggingMLUpdateable\
    -x 20 -t data/Enron.arff \
    -W meka.classifiers.multilabel.incremental.BRUpdateable -- \
    -W weka.classifiers.bayes.NaiveBayesUpdateable

Submit to the queue with sbatch:

biowulf$ sbatch meka.batch