-
microsoft/synapseml 1.0.8
Simple and Distributed Machine Learning
Scala versions: 2.12 -
feathr-ai/feathr 1.0.0
Feathr – A scalable, unified data and AI engineering platform for enterprise
Scala versions: 2.12 -
lucacanali/sparkmeasure 0.24
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
Scala versions: 2.13 2.12 -
hydrospheredata/mist 0.6.4
Serverless proxy for Spark cluster
Scala versions: 2.11 2.10 -
azure/azure-event-hubs-spark 2.1.5
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Scala versions: 2.11 -
azure/azure-cosmosdb-spark 3.7.0
Apache Spark Connector for Azure Cosmos DB
Scala versions: 2.11 -
treeverse/lakefs 0.14.1
lakeFS - Data version control for your data lake | Git for data
Scala versions: 2.12 -
streamnative/pulsar-spark 2.4.5
Spark Connector to read and write with Pulsar
Scala versions: 2.11 -
microsoft/mobius 2.0.200
C# and F# language binding and extensions to Apache Spark
Scala versions: 2.11 -
chermenin/spark-states 0.2
Custom state store providers for Apache Spark
Scala versions: 2.12 2.11 -
sansa-stack/sansa-stack 0.9.5
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Scala versions: 2.12 -
swoop-inc/spark-records 3.0.1
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Scala versions: 2.12 -
databrickslabs/automl-toolkit 0.7.2
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Scala versions: 2.11 -
uosdmlab/spark-nkp 0.3.3
Natural Korean Processor for Apache Spark
Scala versions: 2.11 -
dataflint/spark 0.2.6
Performance Observability for Apache Spark
Scala versions: 2.13 2.12 -
coxautomotivedatasolutions/spark-distcp 0.2.5
A re-implementation of Hadoop DistCP in Apache Spark
Scala versions: 2.13 -
absaoss/hyperdrive 4.7.0
Extensible streaming ingestion pipeline on top of Apache Spark
Scala versions: 2.12 2.11 -
tupol/spark-utils 0.6.2
Basic framework utilities to quickly start writing production ready Apache Spark applications
Scala versions: 2.12 -
heartsavior/spark-state-tools 0.4.0
Spark Structured Streaming State Tools
Scala versions: 2.12 2.11