A push-style validator for JSON-like structures, built on upickle's
Visitor framework. Cross-platform (JVM and Scala.js); validates the instance as it's parsed,
without building an instance AST. Passes the official
JSON Schema Test Suite for draft
2020-12 end-to-end (mandatory + optional format), with regression assertions on output shape.
JVM:
// sbt
libraryDependencies += "io.github.jam01" %% "json-schema" % "0.2.0"
// Mill
ivy"io.github.jam01::json-schema::0.2.0"<!-- Maven -->
<dependency>
<groupId>io.github.jam01</groupId>
<artifactId>json-schema_3</artifactId>
<version>0.2.0</version>
</dependency>Scala.js:
// sbt
libraryDependencies += "io.github.jam01" %%% "json-schema" % "0.2.0"
// Mill
ivy"io.github.jam01::json-schema::0.2.0"<!-- Maven -->
<dependency>
<groupId>io.github.jam01</groupId>
<artifactId>json-schema_sjs1_3</artifactId>
<version>0.2.0</version>
</dependency>ujson is not a direct dependency — bring your own upickle Transformer. Examples below use ujson.
The Scala 3 / JVM + Scala.js story is the obvious uniqueness, but the design choices below are worth considering against any modern validator:
| Capability | this lib | networknt (JVM) | ajv (JS) | jsonschema-rs (Rust) | python-jsonschema |
|---|---|---|---|---|---|
| Draft 2020-12, official test suite | ✅ | ✅ | ✅ | ✅ | ✅ |
| All four spec output formats | ✅ Flag / Basic / Detailed / Verbose | partial | own format (not spec-aligned) | partial | partial |
| Annotations as first-class output + dependency API | ✅ AllowList, findAnnotatingUnits, vocab-level |
partial | not in standard output | partial | partial |
Correct unevaluated* across $ref / applicators / if/then/else |
✅ | ✅ | ✅ | ✅ | partial |
| Push-style streaming (no instance AST built) | ✅ | ❌ (Jackson tree) | ❌ | ❌ (serde_json::Value) |
❌ |
| Cross-platform native (single library, multiple targets) | ✅ JVM + Scala.js | ❌ JVM only | ❌ JS only | ❌ Rust only | ❌ Python only |
| Vocabulary extension at the spec-vocabulary level | ✅ public Vocab/VocabFactory |
keyword-level | keyword-level (different model) | keyword-level | ❌ |
| Scala 3 native | ✅ | n/a | n/a | n/a | n/a |
What this lib does not claim against the field: adoption (networknt and ajv are widely deployed); measured throughput (jsonschema-rs and pre-compiled ajv are very fast; this library has no published benchmarks yet); and breadth of supported drafts (2020-12 only — most listed alternatives support draft-04 through 2020-12).
Where it lands: if you want a Scala 3 native validator that runs on both JVM and the browser, treats annotations as part of the contract, and lets you stream large instances or plug in your own vocabularies — this is the most aligned option available.
import io.github.jam01.json_schema.{Schema, OutputUnit}
import io.github.jam01.json_schema as js
val sch: Schema = js.from(ujson.Readable, ujson.Readable.fromString("""{"type":"string"}"""))
val r: OutputUnit = sch.validate(ujson.Readable, ujson.Readable.fromString(""""hello""""))
assert(r.vvalid)sch.validate builds a one-shot validator and applies it. For repeated validation against the
same schema, build the validator once (next section).
import io.github.jam01.json_schema.{Schema, OutputUnit, Config}
import io.github.jam01.json_schema as js
import upickle.core.Visitor
val sch: Schema = js.from(ujson.Readable, ujson.Readable.fromString("""{"type":"string"}"""))
val v: Visitor[?, OutputUnit] = js.validator(sch)
val r1 = ujson.Str("foo").transform(v)
val r2 = ujson.Str("bar").transform(v)The returned visitor is not thread-safe, but is safe for repeated sequential
.transform(...) calls — including after a ValidationException (the per-traversal state
resets at the end of each root scope).
Under the default Config(ffast = true), a failed validation throws ValidationException;
the wrapped OutputUnit is on e.result. Disable by passing Config(ffast = false) to get the
unit back through the normal return.
Schemas referenced via $ref are resolved through a Registry. To validate a schema that
references others, populate one MutableRegistry with every schema and pass it to both from
and the validator:
import io.github.jam01.json_schema.{MutableRegistry, Config}
import io.github.jam01.json_schema as js
val reg = new MutableRegistry
val userSch = js.from(ujson.Readable, ujson.Readable.fromString(userSchemaJson), registry = reg)
val orderSch = js.from(ujson.Readable, ujson.Readable.fromString(orderSchemaJson), registry = reg)
val v = js.validator(orderSch, Config.Default, registry = reg)Registry is read-only; MutableRegistry extends Registry. Pre-populate it once, share it
across validators.
Selected via Config(format = …). Mirrors the four formats described in
JSON Schema 2020-12 §12.4.
| Format | Shape |
|---|---|
Flag |
Single root unit with valid only. Cheapest. Default. |
Basic |
Single root unit; details is a flat list of keyword-level units. |
Detailed |
Hierarchical, retains only error units and annotated successes. |
Verbose |
Hierarchical, retains every unit. |
Example. Schema {"properties":{"name":{"type":"string","minLength":3}}} against
{"name":"ab"}, rendered with OutputUnitW:
Flag:
{ "valid": false, "keywordLocation": "", "instanceLocation": "" }Detailed:
{
"valid": false, "keywordLocation": "", "instanceLocation": "",
"details": [{
"valid": false, "keywordLocation": "/properties", "instanceLocation": "",
"details": [{
"valid": false, "keywordLocation": "/properties/name", "instanceLocation": "/name",
"details": [{
"valid": false, "keywordLocation": "/properties/name/minLength", "instanceLocation": "/name",
"error": "String length 2 is less than minimum 3"
}]
}]
}]
}Basic produces the same set of keyword units but flat at the root.
Verbose additionally retains every successful unit. Use OutputUnitW.transform(unit, ujson.StringRenderer())
to serialize.
Beyond pass/fail, JSON Schema defines an annotation mechanism: keywords that succeed can attach a value to
their evaluation site, observable downstream by the user and by sibling keywords. This is what
makes unevaluatedItems / unevaluatedProperties correct in the first place, and what powers things like
form generation, doc rendering, or routing decisions driven by schema metadata.
This library treats annotations as first-class. Specifically:
- Spec-compliant emission across all four output formats —
Detailedretains successful units that carry annotations;Verboseretains every successful unit;Basicflattens annotated units at the root;Flagdrops them. See JSON Schema 2020-12 §7.7. - Correct cross-keyword propagation —
unevaluatedItemsandunevaluatedPropertiessee the annotations produced by siblingproperties/patternProperties/additionalProperties/prefixItems/items/contains, including across$ref,allOf/anyOf/oneOf, andif/then/elsebranches. - Pruning on invalidated branches — when a
oneOfarm fails, anifselects the other branch, or an applicator otherwise discards a result, the annotations produced by the discarded subtree are pruned before downstream keywords consume them. This is the kind of thing many implementations get wrong. AllowListfor tuning what lands in the output —Config(allowList = …). Choices:KeepAll,DropAll(default),Keep(Set[String]),Drop(Set[String]). Useful when you want, say,title/description/x-*for downstream tooling but not theprefixItemsindex annotations.OutputUnit.findAnnotatingUnits(insLoc, keyword)for extracting annotations from a result tree without re-traversing it yourself.
Custom vocabularies plug into the same mechanism: emit an annotation via mkUnit(..., annotation = …),
and declare a dependency on others via ctx.registerDependant(…) + ctx.getDependenciesFor(…). See
the Unevaluated vocab for the canonical pattern.
format is annotation-only by default. To make it assert, opt in via Dialect.FormatAssertion:
import io.github.jam01.json_schema.{Config, Dialect}
import io.github.jam01.json_schema as js
val v = js.validator(sch, Config(dialect = Dialect.FormatAssertion))Asserted formats: date-time, date, time, duration, email, idn-email, hostname,
idn-hostname, ipv4, ipv6, uuid, uri, uri-reference, iri, iri-reference,
uri-template, json-pointer, relative-json-pointer, regex.
Known limitations:
durationdoes not rejectP…Wcombined with non-week units (the ISO 8601 ambiguity is left to upstreamjava.time).idn-hostname/idn-emailare fully compliant on JVM but only best-effort on Scala.js — see Scala.js limitations below.
The validator pushes the instance through a Visitor without ever building an instance-side AST.
Any upickle Transformer/Readable source works — including a java.io.InputStream:
import io.github.jam01.json_schema as js
import java.nio.file.{Files, Paths}
val v = js.validator(sch)
val in = Files.newInputStream(Paths.get("large.json"))
try ujson.InputStreamParser.transform(in, v)
finally in.close()Under ffast = true (default), an invalid path short-circuits as soon as the failing element is
seen; the parser does not pull further bytes from the source. (See
shared/src/test/scala/.../StreamingTest.scala for the regression that proves it.)
The same applies to any non-streaming Readable — there is no intermediate reification just for
validation; only a handful of keywords (contains, const, enum) buffer locally and only as
much as the keyword needs.
A vocabulary is a set of keywords with a single VocabFactory companion. To add one: extend
VocabBase, override the visit* methods for the JSON node types your keyword applies to,
declare the factory, and put it in a Dialect.
VocabBase provides Nil-returning defaults for every visit* method, so a string-only
keyword needs to override only visitString:
import io.github.jam01.json_schema.*
final class StartsWith(schema: ObjectSchema, ctx: Context, path: JsonPointer, dynParent: Option[Vocab[?]])
extends VocabBase(schema, ctx, path, dynParent) {
private val prefix: String = schema.getString("startsWith").get
override def visitString(s: CharSequence, index: Int): Seq[OutputUnit] = {
val valid = s.toString.startsWith(prefix)
Seq(mkUnit(valid, "startsWith",
error = if (valid) null else s"""string does not start with "$prefix""""))
}
}
object StartsWith extends VocabFactory[StartsWith] {
override def uri: String = "https://example.com/vocab/starts-with"
override def shouldApply(schema: ObjectSchema): Boolean = schema.value.contains("startsWith")
override def create(schema: ObjectSchema, ctx: Context, path: JsonPointer, dynParent: Option[Vocab[?]]) =
new StartsWith(schema, ctx, path, dynParent)
}Wire it into a dialect and validate:
import io.github.jam01.json_schema.{Config, Dialect, Uri}
import io.github.jam01.json_schema as js
val customDialect = Dialect(
Uri("https://example.com/dialect"),
Dialect.FullSpec.vocabularies :+ StartsWith)
val sch = js.from(ujson.Readable, ujson.Readable.fromString("""{"startsWith":"user-"}"""))
val r = ujson.Str("user-42").transform(js.validator(sch, Config(dialect = customDialect)))
assert(r.vvalid)Tips when writing a vocab:
- The built-in
vocab/Format.scalaandvocab/Metadata.scalaare the smallest reference implementations and useful as templates. - Use
mkUnitfor one-shot units andaccumulate(buff, …)when emitting several from the same scope; both already respect the configuredOutputFormatandAllowList. - Annotation-dependent keywords (think
unevaluatedItems) coordinate through theContext— callctx.registerDependant(…)in the constructor andctx.getDependenciesFor(…)atvisitEnd. Seevocab/Unevaluated.scalafor the pattern.
The _sjs1_3 artifact aims for feature parity with the JVM artifact, with one documented
exception:
format: idn-hostnameandformat: idn-emailare validated structurally on Scala.js (label length, character category, hyphen placement, total length) but not against the full IDNA 2008 / RFC 5892 tables, Punycode (xn--…) decoding, or RFC 5893 Bidi rules. The JVM target usescom.networknt's RFC 5892 implementation for full conformance. If you need that, validate on the JVM. Seejs/src/main/scala/.../vocab/Idn.scalafor the precise list of checks performed.
- Implements JSON Schema 2020-12. Earlier drafts are not supported.
- Pre-1.0 — the user-facing API may change between minor versions.
- The full JSON Schema Test Suite for draft 2020-12 (mandatory + optional
format) runs in CI.