Library for computing tables (tabulations and cross-tabulations) and histogram data in a format amenable for plotting
Redshift data source for Spark
Terasort-like benchmark for spark 2.x that uses dataframes, saves files in parquet etc for a more realistic testing.
XGBoost4J for Scala with Mac and Linux binaries
Scala reference implementation for Bundle.ML serializers
Various convenience routines/functions/tricks when using spark.
AMQP data source for dstream (Spark Streaming)
Scala-based DSLink implementation for Apache Spark
sparkml extend library implements calculation algorithm
Distributed exome CNV analyzer. Apache 2 licensed.
Avro support for Spark, SQL, and DataFrames
Use standard scala collections to unit test your Spark code.