valdanylchuk / swiftlearner   0.2.0

BSD 2-clause "Simplified" License GitHub

SwiftLearner: Scala machine learning library

Scala versions: 2.11

SwiftLearner: Scala machine learning library

Build Status Join the chat at https://gitter.im/valdanylchuk/swiftlearner Maven Central

Swift Learner

These are some simply written machine learning algorithms. They are easier to follow than the optimized libraries, and easier to tweak if you want to experiment. They use plain Java types and have few or no dependencies. SwiftLearner is easy to fork; you can also copy-paste the individual methods.

Some of the methods are very short, thanks to the elegance of the classic algorithms, and the expressive power of Scala. Some are optimized slightly, just enough to accommodate the test datasets. Those are not idiomatic Scala, closer to CS 101 while loops, which are longer, but perform better. They are still easy to follow.

Use this project as a prototyping library, a cookbook, or a cheat sheet. For high performance and rich features, there are better options. Still, these methods are fully functional and work well for small datasets.

The name comes from Fallout, the greatest game ever. Fallout is a trademark of Bethesda Softworks LLC.

To make one ML enthusiast happy, please star or fork this project ;)

Contents

Examples

Most of the examples I wrote so far are small enough to fit in the tests, so take a look there.

Fisher iris flowers dataset

Iris Virginica flower; credit: Wikimedia Commons

One example is classifying the classic Fisher Iris flower dataset with different algorithms:

The accuracy for backprop and the genetic algorithm goes higher with longer training; these figures are for the quick settings in the automated tests.

Hotel recommendation

Hotel

This is based on Expedia hotel recommendations competition on Kaggle

I have extracted a subset of the fields and data rows to test with NN/Backprop. This is not a full solution, only a small technical demo:

SwiftLearner hotels example

SwiftLearner backprop classifier scales fine to thousands of inputs and millions of examples. The prediction accuracy achieved so far is 0.058, which is nothing spectacular, but certainly an evidence of some learning, compared to a random guess at 0.01.

MNIST handwritten digits

MNIST handwritten digits

Another classic example is classifying the handwritten digits from the MNIST database:

Setup

Add the following line to your build.sbt:

libraryDependencies += "com.danylchuk" %% "swiftlearner" % "0.2.0"

License

This is free software under a BSD-style license. Copyright (c) 2016 Valentyn Danylchuk. See LICENSE for details.