SparkMeasure is a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
A Spark datasource for the HadoopCryptoLedger library
Apache Spark test helper functions with pretty error messages
Spark Marketo Connector
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Spark RDD based implementation of word2phrase algorithm
A refreshing treatment for all quality control ailments. Apache 2 licensed.
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Scala implementation of Histogrammar, with optional front-ends and back-ends as separate Maven projects.
A Spark package for retrieving data from Google Analytics
Collection of libraries for quantitative and financial computation.
Spark Library for Bulk Loading into Cassandra
Spark Library for Bulk Loading into Elasticsearch
Apache Spark Data Source for ROOT File Format
A library that converts between nested DataSets and flatten DataFrames
Spark data source for Salesforce
A Play Module for running Livy Job, that runs code on remote Spark Session.
Google BigQuery support for Spark, SQL, and DataFrames