datamindedbe / lighthouse

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

Website GitHub

Lighthouse

Maven Central CircleCI Codacy Badge

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

Principles

  • Configuration as code
  • Idempotent execution
  • Utilities for easier building and testing Apache Spark based applications

Start using Lighthouse

In your build.sbt, add this:

libraryDependencies += "be.dataminded" %% "lighthouse" % <version>
libraryDependencies += "be.dataminded" %% "lighthouse-testing" % <version> % Test

If you are using Maven, add this to your pom.xml:

<dependency>
    <groupId>be.dataminded</groupId>
    <artifactId>lighthouse_2.11</artifactId>
    <version>[version]</version>
</dependency>
<dependency>
    <groupId>be.dataminded</groupId>
    <artifactId>lighthouse-testing_2.11</artifactId>
    <version>[version]</version>
    <scope>test</scope>
</dependency>

Online Documentation

This README file only contains basic instructions. A website is under-construction and will provide more information and examples.