This project provides the JSON schemas for defining metadata fields and their requirements for The National Archives. It aims to standardize the structure of metadata across catalogues, facilitating interoperability and consistency in data representation.
Catalogues often contain diverse types of data, and consistent metadata structures are crucial for effective data management and searching. This JSON schema defines a standardised format for describing metadata fields, their types, and any constraints or requirements associated with each field.
- Flexibility: The schema supports a wide range of metadata fields commonly found in catalogues, including textual descriptions, numerical values, dates, and more.
- Validation: Ensures that metadata entries adhere to a predefined structure and meet specified requirements, reducing errors and inconsistencies.
- Extensibility: Easily extend the schema to accommodate additional metadata fields or custom requirements specific to different catalogues or use cases.
JSON Schema have defined keywords used to define data.
The National Archives use the defined keywords and extensions for domain specific requirements. These include
daBeforeToday
indicates a supplied date must be before now
Four schemas are used to define the metadata
Additional schemas are used to define specific use cases for metadata
The base schema defines the supported properties names (UUID
, date_late_modified
) and value types.
{
"$id": "/schema/baseSchema",
"type": "object",
"properties": {
"UUID": {
"type": "string",
"format": "uuid",
"alternateKeys": [
{
"tdrFileHeader": "UUID"
}
]
},
"date_last_modified": {
"type": "string",
"format": "date",
"alternateKeys": [
{
"tdrFileHeader": "ClientSideFileLastModifiedDate"
}
],
},
"end_date": {
"type": [
"string",
"null"
],
"format": "date",
"alternateKeys": [
{
"tdrFileHeader": "Date of the record"
}
],
"daBeforeToday": "Validates that end date is earlier than today's date"
}
.....
Definition for a UUID
- UUID - field key
- type - the value must be a string
- format - the string format must be an uuid
- tdrFileHeader - the human-readable key
Definition for date_last_modified
- date_last_modified field key
- type - value can be a string or null
- format - date - the string will be in the format 2023-11-13 (YYYY-MM-DD)
- tdrFileHeader - alternate name used for key ClientSideFileLastModifiedDate
Definition for end_date
- end_date field key
- type - value can be a string or null
- format - date - the string will be in the format 2018-11-13
- tdrFileHeader - alternate name used for key ClientSideFileLastModifiedDate
Example data
{
"UUID": "f373d856-d4ee-4b41-ae89-d7327915c73e",
"file_path":"file:///C:/atransfer/content/interesting.pdf",
"foi_exemption_code":[],
"date_last_modified": "2001-12-03",
"description": "description for catalogue"
}
The closure schema defines the schema for closure information. The schema enforces the presence of several fields and their values that are dependant upon other field values.
"allOf": [
{
"if": {
"properties": {
"closure_type": {
"pattern": "open_on_transfer|OPEN"
}
}
},
"then": {
"properties": {
"closure_period": {
"type": [
"number",
"null"
],
"maximum": 0
},
"closure_start_date": {
"type": [
"string",
"null"
],
"maxLength": 0
},
......
},
"else": {
"properties": {
"foi_exemption_asserted": {
"type": "string",
"format": "date"
},
"closure_type": {
"enum": [
"Closed",
"closed",
"CLOSED"
],
......
},
},
"required": [
"closure_period",
"closure_start_date",
"description_closed",
"foi_exemption_asserted",
"foi_exemption_code"
]
Multiple if/then/else statements allowed
- if closure_type OPEN
- Then
- closure_period must not be set or 0
- no closure_start_date
- ....
- else
- there must be a foi_exemption_asserted date
- the closure_type must be one of
closed_review
,closed_for
orCLOSED
- ...
- there must be values for
closure_period
,closure_start_date
,description_closed
,foi_exemption_asserted
, andfoi_exemption_code
.
- Then
This schema is used to enforce cross attribute relationships.
{
"if": {
"properties": {
"file_name_translation": {
"type": "string",
"minLength": 1
}
}
},
"then": {
"properties": {
"file_name_translation_language": {
"type": "string",
"minLength": 1
}
},
"required": ["file_name_translation_language"]
}
}
If there is a file_name_translation
then there must be a file_name_translation_language
.
This schema defines the required fields in a metadata file uploaded to TDR
The data load SharePoint schema defines what properties are permitted when loading metadata directly from SharePoint.
It is a sub-set of the Base Schema properties.
Example data:
{
"date_last_modified": "2001-12-12",
"client_side_checksum": "8b9118183f01b3df0fc5073feb68f0ecd5a7f85a88ed63ac7d0d242dc2aba2ea",
"file_size": 26,
"file_path": "a/filepath/filename1.docx",
"UUID": "b8b624e4-ec68-4e08-b5db-dfdc9ec84fea"
}
The tdr-metadata-validation produces errors with the following properties:
validationProcess
property
errorKey
For user friendly messages see Validation-message.properties file
The format is {validationProcess}.{property}.{errorKey}={User friendly message}
To use the JSON schema in your project, follow these steps:
- Download: Clone this repository or download the
*schema.json
files directly. - Integration: Integrate the
*schema.json
files into your project where metadata validation is required. - Validation: Use JSON schema validation libraries in your preferred programming language to validate metadata objects against the provided schema.
An example using scala and the networknt json-schema-validator library is shown in BaseSpec.scala.
// load schema
val schemaPath = "metadata-schema/relationshipSchema.schema.json"
val schemaInputStream: InputStream = Files.newInputStream(new File(schemaPath).toPath)
val schema = getJsonSchemaFromStreamContentV7(schemaInputStream)
// load data
val dataPath = "/data/relationship.json"
val dataInputStream = getClass.getResourceAsStream(dataPath)
val node = getJsonNodeFromStreamContent(dataInputStream)
// validate data
val errors: util.Set[ValidationMessage] = schema.validate(node.toPrettyString, InputFormat.JSON)
To publish the schemas locally, run the following commands from the repository directory:
$ sbt copySchema copyValidationMessageProperties package publishLocal
Other sbt projects that have this project as a dependency can access the local snapshot version by changing the version number in their build.sbt or dependencies file, for example:
... other dependencies...
"uk.gov.nationalarchives" % "da-metadata-schema_3" % "[version number]-SNAPSHOT"
... other dependences...