Integrating SMILE and Spark

Spark SMILE

License Build Status codecov Maven Central

Deprecated repository, all features have been upstreamed to the official SMILE repository.

Repository for better integration of Spark MLLib Pipelines and SMILE library.

Setup

Download the dependency from Maven Central

SBT

libraryDependencies += "com.github.pierrenodet" %% "spark-smile" % "0.0.2"

Maven

<dependency>
  <groupId>com.github.pierrenodet</groupId>
  <artifactId>spark-smile_2.12</artifactId>
  <version>0.0.2</version>
</dependency>

What's inside

This repository contains :

  • Distributed GridSearch of SMILE trainer with Spark
  • Integration of SMILE with Spark MLLib Pipelines
  • Seamless interoperability between SMILE and Spark DataFrames

How to use

Distributed GridSearch

val spark = SparkSession.builder().master("local[*]").getOrCreate()

val mushrooms = read.arff("data/mushrooms.arff")

val x = mushrooms.select(1,22).toArray
val y = mushrooms("class").toIntArray

sparkgscv(spark)(5, x, y, Seq(new Accuracy()): _*) { (x, y) => knn(x, y, 3) }

From Spark DataFrame to SMILE DataFrame

import org.apache.spark.smile.implicits._

val mushrooms = spark.read.format("libsvm").load("data/mushrooms.svm")

val x = mushrooms.toSmileDF().select("features").map(t=>t.getArray[AnyRef](0).map(_.asInstanceOf[Double])).toArray
val y = mushrooms.toSmileDF().apply("label").toDoubleArray.map(_.toInt-1)

val res = classification(5, x, y, Seq(new Accuracy()): _*) { (x, y) => knn(x, y, 3) }

println(res(0))

From SMILE DataFrame to Spark DataFrame

import org.apache.spark.smile.implicits._

val spark = SparkSession.builder().master("local[*]").getOrCreate()

val mushrooms = read.arff("data/mushrooms.arff").omitNullRows().toSparkDF(spark)

mushrooms.show()

Use SMILE Classifier (or Regressor) in Spark MLLib Pipeline

val raw = spark.read.format("libsvm").load("data/mushrooms.svm")

val scl = new SmileClassifier()
  .setTrainer({ (x, y) => knn(x, y, 3) })

val bce = new BinaryClassificationEvaluator()
  .setLabelCol("label")
  .setRawPredictionCol("rawPrediction")

val model = scl.fit(data)

println(bce.evaluate(model.transform(data)))

model.write.overwrite().save("/tmp/bonjour")
val loaded = SmileClassificationModel.load("/tmp/bonjour")
println(bce.evaluate(loaded.transform(data)))

Contributing

Feel free to open an issue or make a pull request to contribute to the repository.

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the Apache License Version 2.0 - see the LICENSE file for details.