Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Spark RDD based implementation of word2phrase algorithm
A refreshing treatment for all quality control ailments. Apache 2 licensed.
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Scala implementation of Histogrammar, with optional front-ends and back-ends as separate Maven projects.
A Spark package for retrieving data from Google Analytics
Collection of libraries for quantitative and financial computation.
Spark Library for Bulk Loading into Cassandra
Spark Library for Bulk Loading into Elasticsearch
Apache Spark Data Source for ROOT File Format
A library that converts between nested DataSets and flatten DataFrames
Spark data source for Salesforce
A Play Module for running Livy Job, that runs code on remote Spark Session.
Google BigQuery support for Spark, SQL, and DataFrames
CSV data source for Spark SQL and DataFrames
A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support.
Data Quality Monitoring Tool
Spark data source for Workday