-
mjakubowski84/parquet4s 2.20.0
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Scala versions: 3.x 2.13 2.12 -
smart-data-lake/smart-data-lake 2.7.1
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Scala versions: 2.13 2.12 -
aliyun/aliyun-emapreduce-datasources 2.2.0
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.
Scala versions: 2.11 -
coxautomotivedatasolutions/spark-distcp 0.2.5
A re-implementation of Hadoop DistCP in Apache Spark
Scala versions: 2.13 -
agile-lab-dev/darwin 1.2.2
Avro Schema Evolution made easy
Scala versions: 2.13 2.12 2.11 2.10 -
izeigerman/akkeeper 0.4.11
An easy way to deploy your Akka services to a distributed environment.
Scala versions: 2.12 2.11 -
agile-lab-dev/wasp 2.35.0
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Scala versions: 2.12 2.11 -
apache/incubator-wayang 0.7.1
Apache Wayang(incubating) is the first cross-platform data processing system.
Scala versions: 2.12 2.11 -
romans-weapon/spear-framework 3.1.1-3.0
Rapid ETL/ELT-connectors/pipeline development leveraged on top of Apache Spark
Scala versions: 2.12 -
zuinnote/hadoopoffice
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
-
h2oai/h2o-3 3.30.0.3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Scala versions: 2.11