MLeap: Deploy ML Pipelines to Production
A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Avro SerDe for Apache Spark structured APIs.
Framework to quickly build and maintain Smart Data Lakes
Arc is an opinionated framework for defining data pipelines which are predictable, repeatable and manageable.
The Almaren Framework provides a simplified consistent minimalistic layer over Apache Spark. While still allowing you to take advantage of native Apache Spark features. You can still combine it with standard Spark code.
Basic framework utilities to quickly start writing production ready Apache Spark applications
Executable Apache Spark Tools: Format Converter & SQL Processor
A E2E test tool for Enceladus. Also general dataframe comparison tool
Provides the DebeziumTransform stage
Modified Spark code for SmartDataLakeBuilder
Provides KafkaExtract, KafkaLoad and KafkaCommitExecute stages