The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV
SANSA Query Layer
Approximate Nearest Neighbors in Spark
This project generalizes the Spark MLLIB Batch and Streaming K-Means clusterers in every practical way.
Basic framework utilities to quickly start writing production ready Apache Spark applications
SDK for open source framwork to interact with MaxCompute
Spark-based approximate nearest neighbor search using locality-sensitive hashing
Use Scala API to read/write data from different databases,HBase,MySQL,etc.
Big Data Toolkit for the JVM
An implementation of DBSCAN runing on top of Apache Spark
dllib is a distributed deep learning library running on Apache Spark
Big Spatial Data Processing using Spark
General Vectorization Lib for Machine Learning Tools
Scala library for scraping metadata from specified URLs (e.g. OpenGraph)
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive