peterbanda / incal-spark_ml

Extension of Spark ML library for the temporal domain served by delay line and reservoir computing kernels/transformers, several classification and regression models, and a convenient customizable pipeline execution.

GitHub

InCal Spark ML Library version License

This is an extension of Spark ML library (version 2.2.0) providing:

  • Integrated service with a configurable classification and regression execution, cross-validation, and pre-processing.
  • Several handy transformers and evaluators.
  • Extension of classification and regression for the temporal domain mainly by two kernels (can be combined): a sliding window (delay line) and a reservoir computing network with various topologies and activiation functions.
  • Convenient customizable pipeline execution.
  • Summary evaluation metrics

Examples

Once you have the incal-spark_ml lib on your classpath you are ready to go. To conveniently launch Spark-ML based (command line) apps the SparkMLApp class with automatically created/injected resources: SparkSession and SparkMLService, can be used. You can explore and run the following examples demonstrating the basic functionality (all data is public):

as well as example classifications and regressions for temporal problems:

and clustering:

Note that time-series classifications (and predictions) using convolutional neural networks and LSTMs are served by InCal DL4J library.

Acknowledgement

Development of this library has been significantly supported by a one-year MJFF Grant (2018-2019): Scalable Machine Learning And Reservoir Computing Platform for Analyzing Temporal Data Sets in the Context of Parkinson’s Disease and Biomedicine