An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Base classes to use when writing tests with Spark
The Programming Language Designed For Big Data and AI
MLeap: Deploy Spark Pipelines to Production
GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.
Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Easy access to big things. Library for Apache Spark extending and improving its capabilities
An open-source toolkit for large-scale genomic analysis
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.
A library for querying Binlog with Apache Spark structure streaming, for Spark SQL , DataFrames and [MLSQL](https://www.mlsql.tech).
Connectors for Delta Lake
MLeap allows for easily putting Spark ML pipelines into production
Showcase for IoT Platform Blog
A library based on delta for Spark and MLSQL
This is a library for SQL optimizing/rewriting including Materialized View rewrite
Spark Structured Streaming State Tools
This library is an ongoing effort towards bringing the data exchanging ability between Java/Scala and Python. PyJava introduces Apache Arrow as the exchanging data format.
Kafka offset committer for structured streaming query