krux / hyperion

Scala library and abstractions for AWS DataPipeline

GitHub

Krux Hyperion

Gitter Build Status

In Starcraft, the Hyperion is a Behemoth-class battlecruiser. During the Second Great War, Raynor's Raiders made strategic decisions on the Hyperion's bridge -- the battlecruiser's command center.

Library and abstractions of AWS DataPipeline.

Problem Statement

This project aims to solve the following problem:

  1. Make it easy to define an AWS DataPipeline using a clear, fluent Scala DSL

Configuration

Add the Sonatype.org Releases repo as a resolver in your build.sbt or Build.scala as appropriate.

resolvers += Resolver.sonatypeRepo("releases")

Add Krux Hyperion as a dependency in your build.sbt or Build.scala as appropriate.

libraryDependencies ++= Seq(
  // Other dependencies ...
  "com.krux" %% "hyperion" % "5.5.0"
)

Scala Versions

This project is compiled, tested, and published for the following Scala versions:

  1. 2.11.12
  2. 2.12.9

Usage

Setup

Pipeline Scripts

Some pipeline steps need supporting scripts for execution. These scripts need to be uploaded to an S3 bucket where AWS Data Pipeline can access them.

Configure an S3 bucket and upload the scripts to that bucket with the following command:

$ ./deploy-scripts.sh s3://your-bucket/scripts

In your pipeline configuration be sure to set hyperion.script.uri = s3://your-bucket/scripts/

Creating a pipeline

To create a new pipeline, create a Scala class in com.krux.datapipeline.pipelines. Look at ExampleSpark for an example pipeline.

Manually uploading

To generate a JSON file describing the pipeline, ensure you have created the assembly:

$ sbt assembly

Then, run Krux Hyperion with the class name (specify the external jar location if it's not in the classpath):

$ ./hyperion [-jar your-jar-implementing-pipelines.jar] your.pipelines.ThePipeline generate > ThePipeline.json

Then you can go to the AWS Data Pipeline Management Console, click Create new pipeline and enter the class name for Name and click Import a definition and select Load local file. Finally, click Activate.

Automatically uploading

To create a pipeline automatically, ensure you have created the assembly:

$ sbt assembly

Then, run Krux Hyperion with create and the class name:

$ ./hyperion [-jar your-jar-implementing-pipelines.jar] your.pipeline.ThePipeline create

This will use the DataPipeline API to create the pipeline and put the pipeline definition.

Activating a pipeline

You can activate a pipeline either in the Data Pipeline Management Console, by using the --activate option when using create command or by using the activate command.

$ ./hyperion activate df-1234567890

Scaladoc API

The Scaladoc API for this project can be found here.

License

Krux Hyperion is licensed under APL 2.0.

Note

Due to an AWS DataPipeline bug, all schemas involving data pipelines need to be available in the default search_path.

For more details: https://forums.aws.amazon.com/thread.jspa?threadID=166340