snowplow-incubator / schema-ddl

ASTs and generators for producing various DDL and Schema formats

GitHub

Schema DDL

Release License

Schema DDL is a set of Abstract Syntax Trees and generators for producing various DDL and Schema formats. It's tightly coupled with other tools from Snowplow Platform like Iglu and Self-describing JSON.

Schema DDL itself does not provide any CLI and expose only Scala API.

Quickstart

Schema DDL is compiled against Scala 2.11 and 2.12 and availble on Maven Central. In order to use it with SBT, include following module:

libraryDependencies += "com.snowplowanalytics" %% "schema-ddl" % "0.8.0"

Current features

Flatten Schema

To process JSON Schema in typesafe manner sometimes it's necessary to represent it's nested structure as map of paths to properties. schemaddl.generators.SchemaFlattener.flattenJsonSchema can be used for that. It accepts JSON Schema as json4s.JValue and returns schemaddl.FlatSchema.

Redshift DDL

Current main feature of Schema DDL is to produce Redshift table DDL (with or without Snowplow-specific data). schemaddl.generators.redshift.getTableDdl method can be used for that. It accepts schemaddl.FlatSchema and produces Redshift DDL file with warnings like product types (eg. boolean, string) which cannot be correctly translated into DDL without some manual labor.

Also there's schemaddl.generators.redshift.Ddl module providing AST-like structures for generating DDL in flexible and type-safe manner.

JSON Paths

Amazon Redshift uses COPY command to load data into table. To map data into columns JSONPaths file used. It may be generated with schemaddl.generators.redshift.JsonPathGenerator.getJsonPathsFile method. Which accepts list of schemaddl.generators.redshift.Ddl.Column objects (which can be taken from Table DDL object) and returns JSONPaths file as a string. It's coupled with Table object to preserve structure of the table. For example, you may want to modify list of your Columns by rearranging it depending on some properties, but JSONPaths file always should have the same order of fields and thus we cannot rely on FlatSchema object.

Copyright and License

Schema DDL is copyright 2014-2017 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.