-
whylabs/whylogs-java
Profile and monitor your ML data pipeline end-to-end
Scala (JVM): 2.11 2.12 -
azure/azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
Scala (JVM): 2.10 2.11 -
azure/azure-event-hubs-spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Scala (JVM): 2.11 2.12 -
googleclouddataproc/spark-bigquery-connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Scala (JVM): 2.11 2.12 -
leobenkel/zparkio
Boiler plate framework to use Spark and ZIO together.
Scala (JVM): 2.11 -
aliyun/aliyun-emapreduce-datasources
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.
Scala (JVM): 2.10 2.11 -
absaoss/abris
Avro SerDe for Apache Spark structured APIs.
Scala (JVM): 2.11 2.12 -
helgeho/archivespark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Scala (JVM): 2.11 -
zouzias/spark-lucenerdd
Spark RDD with Lucene's query and entity linkage capabilities
Scala (JVM): 2.10 2.11 -
locationtech-labs/geopyspark
GeoTrellis for PySpark
Scala (JVM): 2.11 -
jelmerk/hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Scala (JVM): 2.11 2.12 2.13 -
clustering4ever/clustering4ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Scala (JVM): 2.11 -
sparkling-graph/sparkling-graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Scala (JVM): 2.10 2.11 -
locationtech/rasterframes
Geospatial Raster support for Spark DataFrames
Scala (JVM): 2.11