System to manage human-readable tables using files
Tabulas is a system to manage human-readable tables using files. Tabulas is a Scala implementation based on the Tabula format. There are three alternatives to represent the content:
- Tabula.YAML, using the YAML format,
- Tabula.JSON, using the JSON format,
- Tabula.Properties, using a sort of Java Properties syntax, but defining the same property name for multiple objects.
In addition, there are two alternatives to export the metadata as schema:
- JSON Schema, for JSON Schema, a vocabulary to annotate and validate JSON documents
- Rx YAML, for Rx, schemata tool for JSON/YAML
- executable JAR file
- The Central Repository
- as dependency:
<dependency>
<groupId>de.tu-dresden.inf.lat.tabulas</groupId>
<artifactId>tabulas-ext_2.13</artifactId>
<version>1.1.0</version>
</dependency>
The Tabula format is a system that puts constraints on other formats. It could be viewed as a simplified type system.
The primitive types are:
String
: any string without any newline ('\n'
0x0A,'\r'
0x0D), and not ending in backslash ('\'
0x5C), neither in blanks ('\t'
0x08,' '
0x20)URI
: any valid Uniform Resource IdentifierInteger
: an integer number (implemented withBigInteger
)Decimal
: a decimal number (implemented withBigDecimal
)List_
... (e.g.List_String
): list of space-separated values, for the types aboveEmpty
: type to ignore any given value
A composite type is a structure containing fields, each of them of a particular primitive type. Each instance may contain values of the defined fields.
For the sake of clarity, we can compare this to a spreadsheet, with the following associations:
- primitive type: allowed type in the spreadsheet cells
- composite type: first row of the spreadsheet defining the column names
- field: a column, with all its cells of the same type
- instance: a row
This is how types are defined with the Tabulas.YAML format.
The type name can be any Tabula String.
The field name can be any Tabula String that does not contain a colon (':'
0x3A), neither an equals sign ('='
0x3D), and is not the words type
or new
.
Each type is defined as follows:
---
- type:
name: TYPE_NAME
where TYPE_NAME can be any identifier.
The fields are defined as follows:
def:
- FIELD_NAME_0:FIELD_TYPE_0
- FIELD_NAME_1:FIELD_TYPE_1
...
- FIELD_NAME_n:FIELD_TYPE_n
where each FIELD_NAME can be any identifier,
and each FIELD_TYPE can be any of the primitive types.
No space must be left before or after the colon.
For example, it is FIELD_NAME_0:FIELD_TYPE_0
and not FIELD_NAME_0: FIELD_TYPE_0
.
The URIs can be shortened by using prefixes.
The prefixes are URIs themselves without colons, because the colon (:
) is used to define the association.
prefix:
- PREFIX_0:URI_0
- PREFIX_1:URI_1
- ...
- PREFIX_n:URI_n
No space must be left before or after the colon. They are applied using the declaration order during parsing and serialization.
Although the serialization shortens every possible URI using the prefixes, it is possible to expand all of them by adding the empty prefix with an empty value, i.e. a colon (:
) alone, and it has to be the first prefix.
This could be useful to rename the prefixes.
The order in which the instances are shown is defined as follows:
order:
- ('-'|'+')FIELD_NAME_a_0
- ('-'|'+')FIELD_NAME_a_1
...
- ('-'|'+')FIELD_NAME_a_k
where the +
and the -
are used to denote whether the reverse order should be used.
For example:
order:
- +id
- -author
orders the instances by id
(ascending) and then by author (descending).
The instances come just after the type definition, with the following syntax:
- FIELD_NAME_0: VALUE_0
FIELD_NAME_1: VALUE_1
...
FIELD_NAME_n: VALUE_n
where each FIELD_NAME is one of the already declared field names in the type and each VALUE contains a String according to the field type.
The values can be any Tabula String.
The blanks ('\t'
0x08, ' '
0x20) at the beginning and at the end are removed.
To declare a multi-line value, each line must finish with backslash ('\'
0x5C), except the last one.
The formatter normalizes the values and present them differently according to the declared type.
For example, the values of fields with type List_
... (e.g. List_String
) will be presented as multi-line values.
This is an example of a library file.
Each book record contains an identifier (id
), a title (title
), the authors (authors
), a link to the abstract on the web (web
), and a list of links to the documents (documents
).
The entries are ordered by identifier.
---
- type:
name: record
def:
- id:String
- title:String
- authors:List_String
- web:URI
- documents:List_URI
prefix:
- arxiv:https://arxiv.org/
order:
- +id
- id: arXiv:1412.2223
title: A topological approach to non-Archimedean Mathematics
authors:
- Vieri Benci
- Lorenzo Luperi Baglini
web: https://arxiv.org/abs/1412.2223
documents:
- https://arxiv.org/pdf/1412.2223#pdf
- https://arxiv.org/ps/1412.2223#ps
- https://arxiv.org/format/1412.2223#other
- id: arXiv:1412.3313
title: Infinitary stability theory
authors:
- Sebastien Vasey
web: https://arxiv.org/abs/1412.3313
documents:
- https://arxiv.org/pdf/1412.3313#pdf
- https://arxiv.org/ps/1412.3313#ps
- https://arxiv.org/format/1412.3313#other
The unit tests include an example like this one.
For example, the MainSpec class does the following steps:
- read the example file
- add a new field
numberOfAuthors
- add to each record the number of authors
- compare the expected result
This project also includes some converters from and to other formats. Every deserializer (parser) and serializer (renderer) is registered as an extension. Some serializers and some deserializers cannot map completely the content of a Tabula file.
serializer | stores metadata | stores entries |
---|---|---|
YAML | yes | yes |
JSON | yes | yes |
JSON Schema | yes | no |
Rx YAML | yes | no |
HTML | no | yes |
Wikitext | no | yes |
CSV | no | yes |
SQL | no | yes |
(Wikitext: is a wiki markup language)
deserializer | requires metadata |
---|---|
YAML | yes |
JSON | yes |
CSV | no |
The given example as Tabula.Properties:
# simple format 1.0.0
type:
name: record
def: \
id:String \
title:String \
authors:List_String \
web:URI \
documents:List_URI
prefix: \
arxiv:https://arxiv.org/
order: \
+id
new:
id: arXiv:1412.2223
title: A topological approach to non-Archimedean Mathematics
authors: \
Vieri Benci \
Lorenzo Luperi Baglini
web: &arxiv;abs/1412.2223
documents: \
&arxiv;pdf/1412.2223#pdf \
&arxiv;ps/1412.2223#ps \
&arxiv;format/1412.2223#other
new:
id: arXiv:1412.3313
title: Infinitary stability theory
authors: \
Sebastien Vasey
web: &arxiv;abs/1412.3313
documents: \
&arxiv;pdf/1412.3313#pdf \
&arxiv;ps/1412.3313#ps \
&arxiv;format/1412.3313#other
The unit tests also include the previous example.
Please note that there should be no spaces in the elements of the def
section.
For example, the definition is id:String
and not id: String
.
A YAML file can be easily converted to a JSON file using a Python script like yaml_to_json.py.
The command line application can be used to execute the different readers and writers. They are implemented as extensions. Each extension registers at the beginning of the execution and is available to be executed from the command line.
The following example contains some extensions listed by the application, when no parameters are given.
yaml
(input) (output): create a Tabula.YAML filejson
(input) (output): create a Tabula.JSON fileproperties
(input) (output): create a Tabula.Properties fileoldformat
(input) (output): create an old Tabula.Properties file, i.e. using the equals sign instead of colon
The command line application can be executed with:
java -jar
(jarname) (extension) (input) (output)
The executable JAR file is available at the link provided in the Download section. If the project is build from its source code, the executable JAR file will be available in the location indicated by the release property of the release notes.
To clone and compile the project:
$ git clone https://github.com/julianmendez/tabulas.git
$ cd tabulas
$ mvn clean install
The created executable library, its sources, and its Javadoc will be in tabulas-distribution/target
.
This executable JAR file requires the Scala library in the same directory.
The required version is shown in the release notes.
To compile the project offline, first download the dependencies:
$ mvn dependency:go-offline
and once offline, use:
$ mvn --offline clean install
The bundles uploaded to Sonatype are created with:
$ mvn clean install -DperformRelease=true
and then on each module:
$ cd target
$ jar -cf bundle.jar tabulas-*
and on the main directory:
$ cd target
$ jar -cf bundle.jar tabulas-parent*
The version number is updated with:
$ mvn versions:set -DnewVersion=NEW_VERSION
where NEW_VERSION is the new version.
This software is distributed under the Apache License Version 2.0.
See release notes.
In case you need more information, please contact julianmendez.