-
projectglow/glow 2.0.0
An open-source toolkit for large-scale genomic analysis
Scala versions: 2.12 -
swoop-inc/spark-alchemy 1.2.1
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Scala versions: 2.12 -
linkedin/isolation-forest 3.2.7
A distributed Spark/Scala implementation of the isolation forest algorithm for unsupervised outlier detection, featuring support for scalable training and ONNX export for easy cross-platform inference.
Scala versions: 2.13 2.12 -
aws/sagemaker-spark spark_2.4.0-1.4.2.dev0
A Spark library for Amazon SageMaker.
Scala versions: 2.11 -
setl-framework/setl 1.0.0-SNAPSHOT
A simple Spark-powered ETL framework that just works 🍺
Scala versions: 2.12 2.11 -
azure/azure-cosmosdb-spark 3.7.0
Apache Spark Connector for Azure Cosmos DB
Scala versions: 2.11 -
clickhouse/spark-clickhouse-connector 0.8.1
Spark ClickHouse Connector build on DataSourceV2 API
Scala versions: 2.13 2.12 -
leobenkel/zparkio 0.10.0
Boiler plate framework to use Spark and ZIO together.
Scala versions: 2.11 -
sparkling-graph/sparkling-graph 0.0.7
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Scala versions: 2.11 2.10 -
clustering4ever/clustering4ever 0.11.0
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Scala versions: 2.11 -
qubole/kinesis-sql 1.2.0_spark-2.4
Kinesis Connector for Structured Streaming
Scala versions: 2.11 -
zouzias/spark-lucenerdd 0.4.0
Spark RDD with Lucene's query and entity linkage capabilities
Scala versions: 2.12 -
streamnative/pulsar-spark 2.4.5
Spark Connector to read and write with Pulsar
Scala versions: 2.11 -
smart-data-lake/smart-data-lake 2.7.1
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Scala versions: 2.13 2.12 -
aliyun/aliyun-emapreduce-datasources 2.2.0
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.
Scala versions: 2.11 -
indix/schemer 0.4.3
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Scala versions: 2.11