Version Matrix

Spark SMILE

License Build Status codecov Maven Central

Repository for better integration of Spark MLLib Pipelines and SMILE library

Setup

Download the dependency from Maven Central

SBT

libraryDependencies += "com.github.pierrenodet" %% "spark-smile" % "0.0.2"

Maven

<dependency>
  <groupId>com.github.pierrenodet</groupId>
  <artifactId>spark-smile_2.12</artifactId>
  <version>0.0.2</version>
</dependency>

What's inside

This repository contains :

  • Distributed GridSearch of SMILE trainer with Spark
  • Integration of SMILE with Spark MLLib Pipelines
  • Seamless interoperability between SMILE and Spark DataFrames

How to use

Distributed GridSearch

val spark = SparkSession.builder().master("local[*]").getOrCreate()

val mushrooms = read.arff("data/mushrooms.arff")

val x = mushrooms.select(1,22).toArray
val y = mushrooms("class").toIntArray

sparkgscv(spark)(5, x, y, Seq(new Accuracy()): _*) { (x, y) => knn(x, y, 3) }

From Spark DataFrame to SMILE DataFrame

import org.apache.spark.smile.implicits._

val mushrooms = spark.read.format("libsvm").load("data/mushrooms.svm")

val x = mushrooms.toSmileDF().select("features").map(t=>t.getArray[AnyRef](0).map(_.asInstanceOf[Double])).toArray
val y = mushrooms.toSmileDF().apply("label").toDoubleArray.map(_.toInt-1)

val res = classification(5, x, y, Seq(new Accuracy()): _*) { (x, y) => knn(x, y, 3) }

println(res(0))

From SMILE DataFrame to Spark DataFrame

import org.apache.spark.smile.implicits._

val spark = SparkSession.builder().master("local[*]").getOrCreate()

val mushrooms = read.arff("data/mushrooms.arff").omitNullRows().toSparkDF(spark)

mushrooms.show()

Use SMILE Classifier (or Regressor) in Spark MLLib Pipeline

val raw = spark.read.format("libsvm").load("data/mushrooms.svm")

val scl = new SmileClassifier()
  .setTrainer({ (x, y) => knn(x, y, 3) })

val bce = new BinaryClassificationEvaluator()
  .setLabelCol("label")
  .setRawPredictionCol("rawPrediction")

val model = scl.fit(data)

println(bce.evaluate(model.transform(data)))

model.write.overwrite().save("/tmp/bonjour")
val loaded = SmileClassificationModel.load("/tmp/bonjour")
println(bce.evaluate(loaded.transform(data)))

Contributing

Feel free to open an issue or make a pull request to contribute to the repository.

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the Apache License Version 2.0 - see the LICENSE file for details.