A framework for writing Spark 2.x applications in a pretty way
The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
SANSA RDF Library
A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
A RPC framework leveraging Spark RPC module
SBT plugin for Apache Spark on AWS EMR
Custom state store providers for Apache Spark
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Writing application logic for Spark jobs that can be unit-tested without a SparkContext
A simple Spark-powered ETL framework that just works 🍺
Building Annoy Index on Apache Spark
Scala and Spark library focused on reading OpenStreetMap Pbf files.
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Spark functions to run popular phonetic and string matching algorithms
Natural Korean Processor for Apache Spark
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Google Spreadsheets datasource for SparkSQL and DataFrames
PageRank in Spark