Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Custom state store providers for Apache Spark
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
Spark connector for SFTP
Avro Data Source for Apache Spark
Basic framework utilities to quickly start writing production ready Apache Spark applications
SANSA Query Layer
Use Scala API to read/write data from different databases,HBase,MySQL,etc.
The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV
This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Scala Library/REPL for Machine Learning Research
Connect Spark to HBase for reading and writing data with ease
Spark library for easy MongoDB access
SDK for open source framwork to interact with MaxCompute
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
ETL Library for Machine Learning - data pipelines, data munging and wrangling
Creating reusable workflows for Apache Spark
Spark based implementation of the Topological Mapper algorithm
dllib is a distributed deep learning library running on Apache Spark