hammerlab / spark-util   3.1.0

Apache License 2.0 GitHub

low-level helpers for Apache Spark libraries and tests

Scala versions: 2.12 2.11 2.10

spark-util

Build Status Coverage Status Maven Central

Spark, Hadoop, and Kryo utilities

Kryo registration

Classes that implement the Registrar interface can use various shorthands for registering classes with Kryo.

Adapted from RegistrationTest:

register(
  cls[A],                  // comes with an AlsoRegister that loops in other classes
  arr[Foo],                // register a class and an Array of that class
  cls[B]  BSerializer(),  // use a custom Serializer
  CDRegistrar              // register all of another Registrar's registrations
)
  • custom Serializers and AlsoRegisters are picked up implicitly if not provided explicitly.
  • AlsoRegisters are recursive, allowing for much easier and more robust accountability about what is registered and why, and ensurance that needed registrations aren't overlooked.

Configuration/Context wrappers

  • Configuration: serializable Hadoop-Configuration wrapper
  • Context: SparkContext wrapper that is also a Hadoop Configuration, for unification of "global configuration access" patterns
  • Conf: load a SparkConf with settings from file(s) specified in the SPARK_PROPERTIES_FILES environment variable

Spark Configuration

Misc