A package for dealing with crowdsourced big data.
Dependency and data pipeline management framework for Spark and Scala
Various convenience routines/functions/tricks when using spark.
ECS connector for Apache Spark
Terasort-like benchmark for spark 2.x that uses dataframes, saves files in parquet etc for a more realistic testing.
Unoffical sink for cassandra for spark structured streaming
Spark connector for RSS and HTML sources.
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
A spark package to approximate the diameter of large graphs
Avro Data Source for Apache Spark
CSV Data Source for Apache Spark 1.x
Tranquility helps you send real-time event streams to Druid and handles partitioning, replication, service discovery, and schema rollover, seamlessly and without downtime.
Lightweight Scala kernel for Jupyter / IPython 3
Apache Spark Extensions
AMQP data source for dstream (Spark Streaming)
A library for reading public web news results from Bing Custom Search using Spark Streaming.
A library for reading social data from Facebook using Spark Streaming.
Scala library for scraping metadata from specified URLs (e.g. OpenGraph)
Spark-based approximate nearest neighbor search using locality-sensitive hashing