EtlFlow is a Functional library in Scala for writing ETL jobs.
Library Documentation https://tharwaninitin.github.io/etlflow/site/
Scala Test Coverage Report https://tharwaninitin.github.io/etlflow/testcovrep/
All the tests are integration tests. That is, they make real API requests to S3, GCS, BigQuery. As such, you'll need to make sure you have variables set to a bucket and object that you can access and manipulate.
Here are all the things you will need to change to run the tests locally:
export GOOGLE_APPLICATION_CREDENTIALS=<...> # this should be full path to GCP Service Account Key Json which should have GCS and BigQuery Read/Write access export GCS_BUCKET=<...> export ACCESS_KEY=<...> export SECRET_KEY=<...> export S3_BUCKET=<...>
Change the region in TestSuiteHelper.scala to your region in AWS for s3 bucket. You also would need docker installed as some of the tests start/stop database docker containers
Now run tests using below sbt command
sbt "project core" test
Requirements and Installation
This project is compiled with scala version 2.12.10 and works with Apache Spark versions 2.4.x. Available via maven central. Add the latest release as a dependency to your project
<dependency> <groupId>com.github.tharwaninitin</groupId> <artifactId>etlflow-core_2.12</artifactId> <version>x.x.x</version> </dependency>
libraryDependencies += "com.github.tharwaninitin" %% "etlflow-core" % "x.x.x"
Please feel free to add issues to report any bugs or to propose new features.