Showcase for IoT Platform Blog
Stratio parent POM for Maven projects.
Use Cascading Taps and Scalding DSL with Spark
Writing application logic for Spark jobs that can be unit-tested without a SparkContext
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
Building Annoy Index on Apache Spark
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
SANSA RDF Library
Apache Spark AWS Lambda Executor (SAMBA)
Joins for skewed datasets in Spark
Natural Korean Processor for Apache Spark
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Google Spreadsheets datasource for SparkSQL and DataFrames
ElasticSearch integration for Apache Spark
PageRank in Spark
This project is used to capture machine learning pipelines created on top of Spark as OK
machine learning for genomic variants
Bucketing and partitioning system for Parquet
A Spark package for retrieving data from Google Analytics