-
swoop-inc/spark-records 3.0.1
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Scala versions: 2.12 -
dataflint/spark 0.2.6
Performance Observability for Apache Spark
Scala versions: 2.13 2.12 -
locationtech-labs/geopyspark 0.3.0
GeoTrellis for PySpark
Scala versions: 2.11 -
timgent/data-flare 3.2.0_0.1.14
Data quality control tool built on spark and deequ
Scala versions: 2.12 -
absaoss/pramen 1.10.1
Resilient data pipeline framework running on Apache Spark
Scala versions: 2.13 2.12 2.11 -
grouzen/zio-apache-parquet 0.1.4
Scala ZIO-powered Apache Parquet library
Scala versions: 3.x 2.13 -
apache/incubator-wayang 0.7.1
Apache Wayang(incubating) is the first cross-platform data processing system.
Scala versions: 2.12 2.11 -
grouzen/zio-apache-arrow 0.1.2
Scala ZIO-powered Apache Arrow library
Scala versions: 3.x 2.13 2.12 -
databeans/lighthouse 0.1.0
Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations should be performed.
Scala versions: 2.12 -
catboost/catboost 1.2.7
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Scala versions: 2.13 2.12 -
h2oai/h2o-3 3.30.0.3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Scala versions: 2.11 -
ytsaurus/ytsaurus 2.4.1
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Scala versions: 2.12