-
catboost/catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Scala (JVM): 2.11 2.12 -
h2oai/h2o-3
Open Source Fast Scalable Machine Learning Platform For Smarter Applications: Deep Learning, Gradient Boosting & XGBoost, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Scala (JVM): 2.10 2.11 -
yotpoltd/metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Scala (JVM): 2.11 2.12 -
hydrospheredata/mist
Serverless proxy for Spark cluster
Scala (JVM): 2.10 2.11 2.12 -
locationtech-labs/geopyspark
GeoTrellis for PySpark
Scala (JVM): 2.11 -
clustering4ever/clustering4ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Scala (JVM): 2.11 -
sparkling-graph/sparkling-graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Scala (JVM): 2.10 2.11 -
setl-framework/setl
A simple Spark-powered ETL framework that just works 🍺
Scala (JVM): 2.11 2.12 -
swoop-inc/spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Scala (JVM): 2.12 -
setl-developers/setl
A simple Spark-powered ETL framework that just works 🍺
Scala (JVM): 2.11 2.12 -
jcdecaux/setl
A simple Spark ETL framework that just works 🍺
Scala (JVM): 2.11 2.12