peterbanda / incal-spark_ml

Extension of Spark ML library for the temporal domain served by delay line and reservoir computing kernels/transformers, several classification and regression models, and a convenient customizable pipeline execution.

GitHub

InCal Spark ML Library version

This is an extension of Spark ML library (version 2.2.0) providing:

  • Integrated service with a configurable classification and regression execution, cross-validation, and pre-processing.
  • Several handy transformers and evaluators.
  • Extension of classification and regression for the temporal domain mainly by two kernels (can be combined): a sliding window (delay line) and a reservoir computing network with various topologies and activiation functions.
  • Convenient customizable pipeline execution.
  • Summary evaluation metrics

Installation

All you need is Scala 2.11. To pull the library you need to add the following dependency to build.sbt

"org.in-cal" %% "incal-spark_ml" % "0.1.0"

or to pom.xml (if you use maven)

<dependency>
    <groupId>org.in-cal</groupId>
    <artifactId>incal-spark_ml_2.11</artifactId>
    <version>0.1.0</version>
</dependency>

Examples

Once you have the incal-spark_ml lib on your classpath you are ready to go. To conveniently launch Spark-ML based (command line) apps the SparkMLApp class with automatically created/injected resources: SparkSession and SparkMLService, can be used. You can explore and run the following examples demonstrating the basic functionality (all data is public):

as well as example classifications and regressions for temporal problems:

Note that time-series classifications (and predictions) using convolutional neural networks and LSTMs are served by InCal DL4J library.

Acknowledgement

Development of this library has been significantly supported by a one-year MJFF Grant (2018-2019): Scalable Machine Learning And Reservoir Computing Platform for Analyzing Temporal Data Sets in the Context of Parkinson’s Disease and Biomedicine