Weka Data Mining with Singularity

By Staff

Sep 26, 2018 | Blog, How To Guides

Weka is a commonly used Machine Learning suite of algorithms for Data Mining with Java.  We’ve developed a Singularity container so that your Weka environment and data can now be moved cross-system on-demand, with all the benefits of the Singularity Image Format (SIF).

Recipe:

BootStrap: docker
From: ubuntu:16.04

%post
    apt-get -y update
    apt-get -y install curl
    apt-get -y install unzip
    apt-get install -y openjdk-8-jre
    curl -sSL "https://prdownloads.sourceforge.net/weka/weka-3-8-3.zip" > weka.zip
    unzip weka.zip -d / && rm -f weka.zip*
    echo 'export CLASSPATH=/weka-3-8-3/weka.jar' >> /environment
    apt-get clean

To build the Weka container, we run:

$ sudo singularity build weka.sif weka.def

Weka builds without any setup required and its basic usage is:

$ singularity exec weka.sif java weka.classifiers.object

Toy datasets are included with this install, let’s test them out with a command:

$ singularity exec weka.sif java weka.classifiers.functions.MultilayerPerceptron \
    -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a -t /weka-3-8-3/data/breast-cancer.arff

It also comes with a multitude of other functions, try the BayesNet function:

$ singularity exec weka.sif java weka.classifiers.bayes.BayesNet -t /weka-3-8-3/data/iris.arff -D \
  -Q weka.classifiers.bayes.net.search.local.K2 -- -P 2 -S ENTROPY \
  -E weka.classifiers.bayes.net.estimate.SimpleEstimator -- -A 1.0

Of course, when you run Weka you’ll want to use real data by adding the -B flag to bind your data directory into the container:

$ singularity exec -B path/to/data:/weka-3-8-3/data weka.sif java weka.classifiers.functions.[function here] \
    -t /weka-3-8-3/data/yourdataset/file.arff [args]

For more information about Weka visit their home page.

Join Our Mailing List

Related Posts

An Introduction to Singularity Containers

Enabling Portable and Secure Computing Environments for High-Performance Workloads.As part of their ongoing efforts to streamline workflows, enhance productivity, and save time, engineers, and developers in enterprises and high performance computing (HPC) focused...

read more

SingularityCE Now Available in EPEL

EPEL (Extra Packages for Enterprise Linux) is a repository of additional packages for Enterprise Linux, including Red Hat Enterprise Linux, AlmaLinux, Oracle Linux, Rocky Linux and others. By integrating SingularityCE with EPEL, starting with release 3.10.4, users may...

read more