Data Quality Monitoring Tool
Ensemble Learning for Apache Spark 🌲
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
Rapid ETL/ELT-connectors/pipeline development leveraged on top of Apache Spark
Rocksdb state storage implementation for Structured Streaming.
A Spark datasource for the HadoopCryptoLedger library
Spark based implementation of the Topological Mapper algorithm
Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing Systems
Apache Spark Sentry Integration
A connector for Apache Spark to access Exasol
A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).
Scala API for Apache Spark SQL high-order functions
Executable Apache Spark Tools: Format Converter & SQL Processor
:sparkles: Spark ML implementation of SOM algorithm (Kohonen self-organizing map)
Type safety for spark columns
Probabilistic data structures java implementation.
Native Spark OSM PBF data source
A E2E test tool for Enceladus. Also general dataframe comparison tool
Distributed k-mer counting and analysis on Apache Spark.