-
smart-data-lake/smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Scala versions: 2.12 2.11 -
sansa-stack/sansa-stack
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Scala versions: 2.12 2.11 -
galliaproject/gallia-core
A schema-aware Scala library for data transformation
Scala versions: 2.13 2.12 -
swoop-inc/spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Scala versions: 2.12 -
googleclouddataproc/spark-bigquery-connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Scala versions: 2.13 2.12 2.11 -
jelmerk/hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Scala versions: 2.13 2.12 2.11 -
simplexspatial/osm4scala
Scala and Spark library focused on reading OpenStreetMap Pbf files.
Scala versions: 2.13 2.12 2.11 2.10 -
databrickslabs/automl-toolkit
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Scala versions: 2.11 -
mrpowers/spark-stringmetric
Spark functions to run popular phonetic and string matching algorithms
Scala versions: 2.13 2.12 2.11 -
potix2/spark-google-spreadsheets
Google Spreadsheets datasource for SparkSQL and DataFrames
Scala versions: 2.11 2.10