Note
Hyperion in this repo is no longer receiving updates. All new pull requests should be made to https://github.com/salesforce/hyperion
Krux Hyperion
In Starcraft, the Hyperion is a Behemoth-class battlecruiser. During the Second Great War, Raynor's Raiders made strategic decisions on the Hyperion's bridge -- the battlecruiser's command center.
Library and abstractions of AWS DataPipeline.
Problem Statement
This project aims to solve the following problem:
- Make it easy to define an AWS DataPipeline using a clear, fluent Scala DSL
Configuration
Add the Sonatype.org Releases repo as a resolver in your build.sbt
or Build.scala
as appropriate.
resolvers += Resolver.sonatypeRepo("releases")
Add Krux Hyperion as a dependency in your build.sbt
or Build.scala
as appropriate.
libraryDependencies ++= Seq(
// Other dependencies ...
"com.krux" %% "hyperion" % "5.5.0"
)
Scala Versions
This project is compiled, tested, and published for the following Scala versions:
- 2.11.12
- 2.12.11
Usage
Setup
Pipeline Scripts
Some pipeline steps need supporting scripts for execution. These scripts need to be uploaded to an S3 bucket where AWS Data Pipeline can access them.
Configure an S3 bucket and upload the scripts to that bucket with the following command:
$ ./deploy-scripts.sh s3://your-bucket/scripts
In your pipeline configuration be sure to set hyperion.script.uri = s3://your-bucket/scripts/
Creating a pipeline
To create a new pipeline, create a Scala class in com.krux.datapipeline.pipelines
.
Look at ExampleSpark for an example pipeline.
Manually uploading
To generate a JSON file describing the pipeline, ensure you have created the assembly:
$ sbt assembly
Then, run Krux Hyperion with the class name (specify the external jar location if it's not in the classpath):
$ ./hyperion [-jar your-jar-implementing-pipelines.jar] your.pipelines.ThePipeline generate > ThePipeline.json
Then you can go to the AWS Data Pipeline Management Console, click Create new pipeline and enter the class name for Name and click Import a definition and select Load local file. Finally, click Activate.
Automatically uploading
To create a pipeline automatically, ensure you have created the assembly:
$ sbt assembly
Then, run Krux Hyperion with create
and the class name:
$ ./hyperion [-jar your-jar-implementing-pipelines.jar] your.pipeline.ThePipeline create
This will use the DataPipeline API to create the pipeline and put the pipeline definition.
Activating a pipeline
You can activate a pipeline either in the Data Pipeline Management Console, by using the --activate
option when using create
command or by using the activate
command.
$ ./hyperion activate df-1234567890
Scaladoc API
The Scaladoc API for this project can be found here.
License
Krux Hyperion is licensed under APL 2.0.
Note
Due to an AWS DataPipeline bug, all schemas involving data pipelines need to be available in the default search_path.
For more details: https://forums.aws.amazon.com/thread.jspa?threadID=166340