The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support.
A Variant Caller, Distributed. Apache 2 licensed.
This project aims to make writing Spark applications easier by abstracting the effort to assemble the driver into reusable steps and pipelines.
Dependency and data pipeline management framework for Spark and Scala
Ensemble Learning for Apache Spark 🌲
machine learning for genomic variants
A connector for Apache Spark to access Exasol
Spark data source for Salesforce
Data quality tools for Big Data
JSON schema parser for Apache Spark
Read SparkSQL parquet file as RDD[Protobuf]
Integrating SMILE and Spark
A library to query heterogeneous data sources uniformly using SPARQL
Scala and Spark library focused on reading OpenStreetMap Pbf files.
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
An extension to the amazing Spark framework for better functional programming.
Spark-Transformers: Library for exporting Apache Spark MLLIB models in to use them in any Java application with no other dependencies.
Google Spreadsheets datasource for SparkSQL and DataFrames