Spark Structured Streaming State Tools
Arc is an opinionated framework for defining data pipelines which are predictable, repeatable and manageable.
Infinispan Spark Connector
Comet Data Pipeline is a Spark Based On Premise and Cloud Ingestion Framework for Batch & Streaming (Coming) Systems
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
A neural network library which trained by Spark RDD instances.
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
This library is an ongoing effort towards bringing the data exchanging ability between Java/Scala and Python. PyJava introduces Apache Arrow as the exchanging data format.
Framework to quickly build and maintain Smart Data Lakes
General utility code used across BDG products. Apache 2 licensed.
Spline agent for Apache Spark
InfluxDB connector to Apache Spark on top of Chronicler
A Neural network implementation with Scala
A library for Spark DataFrame using MinIO Select API
The Almaren Framework provides a simplified consistent minimalistic layer over Apache Spark. While still allowing you to take advantage of native Apache Spark features. You can still combine it with standard Spark code.
AMQP data source for dstream (Spark Streaming)
Load genomic BAM files using Apache Spark
Spark SQS Amazon queue receiver
Miscellaneous functionality for manipulating Apache Spark RDDs.
Basic framework utilities to quickly start writing production ready Apache Spark applications