GeoTrellis for PySpark
Spark RDD with Lucene's query and entity linkage capabilities
Spark metrics related custom classes and sinks (e.g. Prometheus)
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Spark-based approximate nearest neighbor search using locality-sensitive hashing
Geospatial Raster support for Spark DataFrames
C4E, a Scala or Spark library for local and distributed Clustering.
Snowflake Data Source for Apache Spark.
MLeap allows for easily putting Spark ML pipelines into production
A framework for writing Spark 2.x applications in a pretty way
Secondary sort and streaming reduce for Apache Spark
A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support.
A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Distributed Matrix Library
A Variant Caller, Distributed. Apache 2 licensed.
SANSA RDF Library
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
A tool for hyperparameter optimization of machine learning models
Axle Domain Specific Language for Scientific Cloud Computing and Visualization
Boiler plate framework to use Spark and ZIO together.