jpl-imce / gov.nasa.jpl.imce.oml.tables   0.95.3

Website GitHub

Definition of the normalized schema tables for JPL's Ontological Modeling Framework (OMF).

Scala versions: 2.11
Scala.js versions: 0.6

Normalized Database Schema Tables for JPL's Ontological Modeling Language (OML)

Build Status NPM: Download Maven: Download

Copyrights

Caltech

License

Apache-2.0

Description

This project specifies a set of normalized schema tables for JPL's Ontological Modeling Language. By normalize schema tables, we mean precisely a 4th Normal Form database schema.

This schema is intended to be a single source of truth for technology-neutral data interchange of OMF models. By technology-neutral data interchange, we mean the separation between:

  • the specification of the data to be exchanged among tools,
  • the representation of this data in a particular technology stack.

Normalized schema tables specify the shape of the data to be exchanged in terms of tables with single-valued columns. This means that for each table:

  • each column specifies a simple attribute typed by a scalar datatype (e.g. string, integer, boolean, ..)
  • there are no "multiple values" in a given table row; instead, multiple values are represented as multiple rows.

The representation of these normalized schema tables is deliberately left open to leverage various technologies and serializations. In particular, representation technologies include but are not limited to:

  • JavaScript
  • Java
  • Scala

In particular, serializations include but are not limited to:

  • JSon
  • RDF (RDF/XML, RDF/Json, RDF/NTriples, ...)
  • OWL (OWL/XML, RDF/XML, Manchester, ...)
  • XML
  • SQL

The reference serialization for OMF normalized schema tables data is Json in the following format:

  • Each row of a table is a single line Json tuple of name/value pairs for each table column.

This format is deliberately chosen to facilitate processing OMF data according to the Reactive Manifesto; for example, using Apache Spark.

GIT and *.omlzip archives

For change management purposes, the *.omlzip serialization format yields the following benefits:

  • Minimal, local formatting

    Every serialization involves some kind of formatting. XMI and XText serializations of OML involve global formatting: the serialization of OML ModuleElements is indented in the serialization of the containing OML Module.

    In contrast, *.omlzip uses local formatting because each OML object is serialized in a single JSON line. This local formatting is minimal because it involves a tuple of name/value pairs where a value can be a string, a number, a boolean or null (no array values, no nested JSON objects).

  • Fast serialization

    Minimal, local formatting speeds up serialization because it eliminates the overhead of indentation inherent in global formatting.

  • Simple & precise comparisons

    Global formatting complicates comparison because there are two sources of differences to consider:

    • differences in the internal representation of an object
    • differences in the global context where an object is serialized

    The computational complexity of comparing globally formatted representation varies with the particular format. For OML, XMI and XText serializations are inherently tree-based serializations; that is, labeled, ordered trees. Many algorithms exists for comparing such trees; depending on the properties of the tree, their time complexity varies between O(n^2 log^2_n) and O(n^4). The Robust Tree Edit Distance (RTED) algorithm achieves an optional worst-case complexity of O(n^3).

    With *.omlzip, comparison has linear worst-case complexity of O(n) since each JSON-lines files is sorted and each line is flat ordered list of name/value pairs. Furthermore, each of the 66 JSON lines files corresponds to one of the 66 concrete OML metaclasses. Therefore, additions/deletions in a particular JSON lines file correspond to creating/deleting instances of the corresponding OML concrete metaclass.

  • GIT friendly

    To configure GIT for simple & precise comparison of *.omlzip files in a GIT project:

    • Add the following to .git/config:
    [diff "zip"]
      textconv = unzip -c -q
    
    • Add the following to .gitattributes:
    *.omlzip diff=zip
    

    Two different *.omlzip archives may have the same contents (as seen by unzip -c -q) but the timestamps in the ZIP archive may differ.

    For example, suppose a GIT repository has a file: example.omlzip. If nothing has changed in the OML contents and a new archive overwrites the existing one, then GIT may see a modification (due to the difference in timestamps in the ZIP metadata) but diffing the contents should confirm there is no significant change.

    $ git status
    
      modified: *.omlzip
    
    $ git diff
    $ 
    

Scala as a single-source of truth

The OML normalized schema tables are specified in the Scala programming language:

  • each table is a Scala case class
  • each table column is an immutable field of a Scala case class

The source code for these normalized schema tables was generated from the OML Specification.

Via cross-compilation using scala.js, tables and column fields are annotated @JSExport to make them accessible by their name in JavaScript.

Cross-compiling this project results in three distinct libraries:

  1. A JVM library for writing pure Java applications; mixed Java/Scala applications using Scala or pure Scala applications.

  2. A ScalaJS library for writing mixed Java/Scala/JavaScript applications using Scala.JS.

  3. An NPM module for developing pure JavaScript applications using conventional JavaScript practices.

All 3 libraries are built on Travis CI and published on Bintray NPM and Bintray Maven.

Polyglot interoperability of the OMF Schema tables.

Java & Scala seem to be OK.

For JavaScript, the accessors use the ScalaJS field names as functions; e.g.:

IDE Support

Intellij IDEA (2016 & later)

  • Import the github project from existing sources as an SBT project.

Intellij will import the root project (tablesRoot) and the two cross-build variants (tablesJS, tablesJVM). Since all the Intellij-specific metadata can be re-created by simply importing the project, it is unecessary to store this metadata in github.

  • It is possible to work using both Intellij IDEA and the SBT CLI in a terminal.

Eclipse Neon.3 Installation

  • Xtext 2.11

  • Xtend 2.11

  • Scala

    Use the Eclipse Marketplace and search for 'Scala'. Install all components of Scala IDE 4.2.x

  • SDKs

    Install the following components by going to Install New Software and searching for...

    Eclipse EMF SDK Eclipse EMF XCore SDK Eclipse Xtend SDK Eclipse SDK Eclipse EMF Parsley CDO Eclipse EMF/MWE2 runtime & language

Eclipse Projects

There are several projects related to the OMF Schema Tables:

Why is Eclipse so complicated?

Unfortunately, Eclipse lacks good support for SBT projects of any kind. The Eclipse-specific metadata was initially generated with sbt eclipse and subsequently edited as follows:

  • Fix the Eclipse resource links in js/.project and jvm/.project to use location-neutral paths:

    <linkedResources>
      <link>
        <name>jpl.omf.schema.tables-shared-src-main-scala</name>
        <type>2</type>
        <location>PARENT-1-PROJECT_LOC/shared/src/main/scala</location>
      </link>
      <link>
        <name>jpl.omf.schema.tables-shared-src-test-scala</name>
        <type>2</type>
        <locationURI>PARENT-1-PROJECT_LOC/shared/src/test/scala</locationURI>
      </link>
    </linkedResources>
    
  • Define an Eclipse Classpath variable, IVY_CACHE for the location of the Ivy cache used by SBT (typically, $HOME/.ivy2/cache)

  • Fix the Eclipse library paths in js/.classpath and jvm/.classpath to use the IVY_CACHE classpath variable:

    E.g., in js/.classpath:

     <classpathentry kind="var" path="IVY_CACHE/org.scala-js/scalajs-library_2.11/jars/scalajs-library_2.11-0.6.12.jar"/>
     <classpathentry kind="var" path="IVY_CACHE/com.lihaoyi/upickle_sjs0.6_2.11/jars/upickle_sjs0.6_2.11-0.4.1.jar"/>
     ...
    

    E.g., in jvm/.classpath:

     <classpathentry kind="var" path="IVY_CACHE/org.scala-lang.modules/scala-xml_2.11/bundles/scala-xml_2.11-1.0.2.jar"/>
     <classpathentry kind="var" path="IVY_CACHE/com.lihaoyi/upickle_2.11/jars/upickle_2.11-0.4.1.jar"/>
     ...
    
  • Limitations:

The Eclipse metadata files should be properly generated with sbt eclipse; the above is a workaround! Do not update this project dependencies by editing the Eclipse metadata files; instead, update the SBT configuration and either use sbt eclipse + post-editing or update the Eclipse metadata files accordingly.

Eclipse JavaScript does not properly recognize *.js files written as node shell scripts (See: shared/test/js/)

Eclipse Configuration

Go to Configure Contents and then tick the checkbox for Error/Warnings on Project

Publishing to & resolving from bintray.com as a scoped NPM package.

Publishing a scoped NPM package is important for using a combination of multiple NPM repositories for resolving NPM packages:

  • Unscoped packages are resolved against the default NPM repository.
  • Scoped packages are resolved against a scoped entry in the project or user's .npmrc.

For publishing, .npmrc needs:

@imce:registry=https://api.bintray.com/npm/jpl-imce/gov.nasa.jpl.imce.npm/
//api.bintray.com/npm/jpl-imce/gov.nasa.jpl.imce.npm/:username=nrouquette
//api.bintray.com/npm/jpl-imce/gov.nasa.jpl.imce.npm/:_authToken=<base64 API key>
//api.bintray.com/npm/jpl-imce/gov.nasa.jpl.imce.npm/:[email protected]
//api.bintray.com/npm/jpl-imce/gov.nasa.jpl.imce.npm/:always-auth=true

For resolving, .npmrc needs:

@imce:registry=https://api.bintray.com/npm/jpl-imce/gov.nasa.jpl.imce.npm/
//api.bintray.com/npm/jpl-imce/gov.nasa.jpl.imce.npm/:always-auth=false

Testing JS library

Make sure the Node environment is properly setup:

nvm install stable
nvm use stable
npm install
// See https://github.com/scala-js/scala-js/issues/2902#issuecomment-296776240
npm install [email protected]
sbt fullOptJS
node shared/test/js/index