tharwaninitin / etlflow

Functional, Composable library in Scala based on ZIO for writing ETL jobs in AWS and GCP https://tharwaninitin.github.io/etlflow/site/

GitHub

EtlFlow

Tests Maven Central

EtlFlow is a Functional library in Scala for writing ETL jobs.

Documentation

Library Documentation https://tharwaninitin.github.io/etlflow/site/

Scala Test Coverage Report https://tharwaninitin.github.io/etlflow/testcovrep/

Running Tests

All the tests are integration tests. That is, they make real API requests to S3, GCS, BigQuery. As such, you'll need to make sure you have variables set to a bucket and object that you can access and manipulate.

Here are all the things you will need to change to run the tests locally:

export GOOGLE_APPLICATION_CREDENTIALS=<...> # this should be full path to GCP Service Account Key Json which should have GCS and BigQuery Read/Write access
export GCS_BUCKET=<...> 
export ACCESS_KEY=<...>
export SECRET_KEY=<...>
export S3_BUCKET=<...>

Change the region in TestSuiteHelper.scala to your region in AWS for s3 bucket. You also would need docker installed as some of the tests start/stop database docker containers

Now run tests using below sbt command

sbt "project core" test

Requirements and Installation

This project is compiled with scala version 2.12.10 and works with Apache Spark versions 2.4.x. Available via maven central. Add the latest release as a dependency to your project

Latest Version

Maven

<dependency>
    <groupId>com.github.tharwaninitin</groupId>
    <artifactId>etlflow-core_2.12</artifactId>
    <version>x.x.x</version>
</dependency>

SBT

libraryDependencies += "com.github.tharwaninitin" %% "etlflow-core" % "x.x.x"

Contributions

Please feel free to add issues to report any bugs or to propose new features.