This library makes Apache Spark's expression language available as standalone library to evaluate SQL expressions on Scala case classes.
All major dependencies are shaded into the spark-expresssions-standalone jar file with relocated package names, so there is no dependency on Apache Spark needed anymore (shaded), and classpath/version conflicts are avoided too (relocation)
case class Entry(y: String, z: Int)
case class TestObj(s: Seq[Entry], b: String)
val input = TestObj(s = Seq(Entry("test0",0), Entry("test1",1)), b = "ok")
val evaluator = SparkExpressionEvaluatorFactory.getEvaluator[TestObj,String]("concat(s[0].y, '-', b)")
val result = evaluator.apply(input)
assert(result == "test0-ok")
It supports most Spark SQL functions, except aggregate functions. Additionally it's possible to define Scala functions as user defined functions, and use them in expressions too:
val funcFirstElementY = (entries: Seq[Entry]) => entries.map(_.y).head
SparkExpressionEvaluatorFactory.registerUdf("get_first_element", funcFirstElementY)
val evaluator = SparkExpressionEvaluatorFactory.getEvaluator[TestObj,String]("get_first_element(s)")
val result = evaluator.apply(input)
assert(result == "test0")
<dependency>
<groupId>ch.zzeekk.spark</groupId>
<artifactId>spark-expressions-standalone_${scala.minor.version}</artifactId>
<version>-PUT-VERSION-HERE-</version>
</dependency>
Note: the minor version of the library show the Apache Spark version that it's based on, e.g. 3.5.0 is based on Apache Spark 3.5.x.