C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
GeoTrellis for PySpark
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Connectors for Delta Lake
Geospatial Raster support for Spark DataFrames
Spark-based approximate nearest neighbor search using locality-sensitive hashing
Profile and monitor your ML data pipeline end-to-end
A framework for writing Spark 2.x applications in a pretty way
MLeap allows for easily putting Spark ML pipelines into production
The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Secondary sort and streaming reduce for Apache Spark
SANSA RDF Library
A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support.
A Variant Caller, Distributed. Apache 2 licensed.
A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Distributed Matrix Library
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Axle Domain Specific Language for Scientific Cloud Computing and Visualization
Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
A tool for hyperparameter optimization of machine learning models