Snowflake Data Source for Apache Spark.
Calliope is a library integrating Cassandra and Spark framework.
This is a library for SQL optimizing/rewriting including Materialized View rewrite
Read and write Tensorflow TFRecord data from Apache Spark.
This library is an ongoing effort towards bringing the data exchanging ability between Java/Scala and Python. PyJava introduces Apache Arrow as the exchanging data format.
Spark RDD with Lucene's query and entity linkage capabilities
GeoTrellis is a geographic data processing engine for high performance applications.
Tranquility helps you send real-time event streams to Druid and handles partitioning, replication, service discovery, and schema rollover, seamlessly and without downtime.
GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.
Geospatial Raster support for Spark DataFrames
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Infinispan Spark Connector
CSV Data Source for Apache Spark 1.x
Lightweight Scala kernel for Jupyter / IPython 3
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
MLeap allows for easily putting Spark ML pipelines into production
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Distributed Matrix Library
A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Quasar Analytics is a general-purpose compiler for translating data processing and analytics over semi-structured data into efficient plans that run 100% in the target infrastructure.