Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
Upserts, Deletes And Incremental Processing on Big Data.
Lightweight real-time big data streaming engine over Akka
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
TiSpark is built for running Apache Spark on top of TiDB/TiKV
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
A library for Spark DataFrame using MinIO Select API
C# and F# language binding and extensions to Apache Spark
🚀 Validation DSL for data pipelines
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
A remote CLI interface for MapR
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Mapflablup is a library to flat ➖ and blowup 🎈 Map Collection
Kotlin Bigdata Toolkit
Obtiene los campos y tablas utilizados en una sentencia SQL
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive