Weka Data Mining with Singularity

By Staff

Sep 26, 2018 | Blog, How To Guides

Weka is a commonly used Machine Learning suite of algorithms for Data Mining with Java.  We’ve developed a Singularity container so that your Weka environment and data can now be moved cross-system on-demand, with all the benefits of the Singularity Image Format (SIF).

Recipe:

BootStrap: docker
From: ubuntu:16.04

%post
    apt-get -y update
    apt-get -y install curl
    apt-get -y install unzip
    apt-get install -y openjdk-8-jre
    curl -sSL "https://prdownloads.sourceforge.net/weka/weka-3-8-3.zip" > weka.zip
    unzip weka.zip -d / && rm -f weka.zip*
    echo 'export CLASSPATH=/weka-3-8-3/weka.jar' >> /environment
    apt-get clean

To build the Weka container, we run:

$ sudo singularity build weka.sif weka.def

Weka builds without any setup required and its basic usage is:

$ singularity exec weka.sif java weka.classifiers.object

Toy datasets are included with this install, let’s test them out with a command:

$ singularity exec weka.sif java weka.classifiers.functions.MultilayerPerceptron \
    -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a -t /weka-3-8-3/data/breast-cancer.arff

It also comes with a multitude of other functions, try the BayesNet function:

$ singularity exec weka.sif java weka.classifiers.bayes.BayesNet -t /weka-3-8-3/data/iris.arff -D \
  -Q weka.classifiers.bayes.net.search.local.K2 -- -P 2 -S ENTROPY \
  -E weka.classifiers.bayes.net.estimate.SimpleEstimator -- -A 1.0

Of course, when you run Weka you’ll want to use real data by adding the -B flag to bind your data directory into the container:

$ singularity exec -B path/to/data:/weka-3-8-3/data weka.sif java weka.classifiers.functions.[function here] \
    -t /weka-3-8-3/data/yourdataset/file.arff [args]

For more information about Weka visit their home page.

Join Our Mailing List

Related Posts