in-cal / incal-spark_ml

Extension of Spark ML library for the temporal domain served by delay line and reservoir computing kernels/transformers, several classification and regression models, and a convenient customizable pipeline execution.

GitHub

InCal Spark ML Library version License Build Status

This is an extension of Spark ML library (version 2.2.0) providing:

  • Integrated service with a configurable classification and regression execution, cross-validation, and pre-processing.
  • Several handy transformers and evaluators.
  • Extension of classification and regression for the temporal domain mainly by two kernels (can be combined): a sliding window (delay line) and a reservoir computing network with various topologies and activiation functions.
  • Convenient customizable pipeline execution.
  • Summary evaluation metrics

Examples

Once you have the incal-spark_ml lib on your classpath you are ready to go. To conveniently launch Spark-ML based (command line) apps the SparkMLApp class with automatically created/injected resources: SparkSession and SparkMLService, can be used. You can explore and run the following examples demonstrating the basic functionality (all data is public):

as well as example classifications and regressions for temporal problems:

and clustering:

Note that time-series classifications (and predictions) using convolutional neural networks and LSTMs are served by InCal DL4J library.

Installation

All you need is Scala 2.11. To pull the library you have to add the following dependency to build.sbt

"org.in-cal" %% "incal-spark_ml" % "0.2.3"

or to pom.xml (if you use maven)

<dependency>
    <groupId>org.in-cal</groupId>
    <artifactId>incal-spark_ml_2.11</artifactId>
    <version>0.2.3</version>
</dependency>

Acknowledgement

Development of this library has been significantly supported by a one-year MJFF Grant (2018-2019): Scalable Machine Learning And Reservoir Computing Platform for Analyzing Temporal Data Sets in the Context of Parkinson’s Disease and Biomedicine