Spark functions to run popular phonetic and string matching algorithms
Parquet-based ML data format optimized for working with unstructured data
Optics for Spark DataFrames
Deriving Spark DataFrame schemas from case classes
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs, scheduling those jobs and monitoring those jobs using UI.
A schema-aware Scala library for data transformation
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Scala client for Amazon Kinesis. Also provides write to Kinesis capability for Apache Spark or Spark Streaming.
A Spark datasource for the HadoopOffice library
dllib is a distributed deep learning library running on Apache Spark
Spark MLlib wrapper for the Snowball framework
A connector for Apache Spark and PySpark to Dgraph databases.
SANSA Query Layer
A library that brings useful functions from various modern database management systems to Apache Spark
An Extensible Data Skipping Framework
Bucketing and partitioning system for Parquet
Extensible streaming ingestion pipeline on top of Apache Spark
A general Inference API based on two of the most popular Big Data processing engines: Apache Spark and Apache Flink
Apache Spark Data Source for ROOT File Format