Stratio parent POM for Maven projects.
Showcase for IoT Platform Blog
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
A library based on delta for Spark and MLSQL
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
Use Cascading Taps and Scalding DSL with Spark
Writing application logic for Spark jobs that can be unit-tested without a SparkContext
Building Annoy Index on Apache Spark
SANSA RDF Library
A simple Spark-powered ETL framework that just works 🍺
This project is used to capture machine learning pipelines created on top of Spark as OK
Apache Spark AWS Lambda Executor (SAMBA)
Joins for skewed datasets in Spark
Google Spreadsheets datasource for SparkSQL and DataFrames
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Natural Korean Processor for Apache Spark
PageRank in Spark
ElasticSearch integration for Apache Spark
SDK for open source framwork to interact with MaxCompute
machine learning for genomic variants