This repository contains a small Scala 3 parser and decoder for a constrained, data-oriented subset of Scala syntax.
Currently the supported use case is decoding data from text into either:
- a Syntax tree (
Expr), - a Scala type that directly matches the structure as written, (preserving nested named tuple types)
- or advanced decoders into custom types.
Decoding into custom types is theoretically faster than from JSON as the order of fields is significant, and repeated field names are forbidden.
This can be used as an alternative configuration file format for applications. Currently encoders are not supported but this is an area of interest.
A source file deliberately supports only syntax that is valid with the default imports and classpath of Scala 3.8.1. (So no references to external classes)
Only allowed are expressions of the types:
- String literal (and
+concatenation) - Int, Long, Float, Double literal
nulltrueorfalse- NamedTuple literal (for structured objects)
Vector(...)literal (for random-access sequences)
/** comments are valid too! */
val conf = (
x = (
label = "abc" + "def",
ys = Vector(1, 2L, -0x1A, 3.14f, 1_000_000)
),
y = null,
// z = temp removal
ok = true
)A source file consists of a single top-level val declaration (of any name), meaning it should
be a valid .scala file.
If tooling permits we could envision a
.sconfile format that only consists of an expression, and no declaration.
Not supported:
- multiple declarations in one file
- arbitrary Scala expressions
- methods, classes, imports, or type definitions
- general collection syntax beyond
Vector(...) - string interpolation or advanced string forms
The core module exposes two main flows.
Parse and decode directly into type that matches the structure of data using a contextual TaggedDecoder[T]:
import scalanotation.*
type Data =
(x: (label: String, xs: Vector[String], ys: IArray[Int]), y: Option[Int], ok: Boolean)
// `given TaggedDecoder[Data]` is derived automatically
val decoded: Result[Data, DecodeError] =
Readers.readValueAs[Data](input, name = "conf")You can also read as a generic syntax tree (Expr) first and use that, or decode into
structured data afterwords:
import scalanotation.*
type Data = (ok: Boolean)
val expr = Readers.readValueAs[Expr](input, name = "conf").get
val decoded = expr.decodeAs[Data]Supported typed decoding targets currently include:
- arbitrary
Repr <: AnyNamedTuple Vector[T],String,Char,Int,Long,Float,Double, andBooleanArray[T],IArray[T], arbitraryArr <: scala.collection.Seq[T]- arbitrary
Dict <: scala.collection.Map[String, T], Option[T]for nullable valuesExpr(generic syntax tree)- custom types via transformation of an existing
TaggedSchema[T] - case class, case object, and enum derived encoders via
derives TaggedSchema
Mapping an existing Schema
Preferred for direct decoding from Strings into domain objects such as Date, or simple enums.
import scalanotation.*
import steps.result.Result
import java.time.LocalDate
enum Mode:
case Fast, Safe
given TaggedSchema[Mode] =
summon[TaggedSchema[String]].emap {
case "fast" => Result.Ok(Mode.Fast)
case "safe" => Result.Ok(Mode.Safe)
case other => Result.Err(DecodeError.Custom(s"Unknown mode '$other'"))
}Derived Schemas For product and sum types, you can derive a decoder automatically, (semi-auto) it is composed of exising schemas for each field.
import scalanotation.*
import steps.result.Result, Result.eval.raise
import java.time.LocalDate
case class Metadata(created: LocalDate, tags: Vector[String]) derives TaggedSchema
case class User(name: String, age: Int, metadata: Metadata) derives TaggedSchema
given TaggedSchema[LocalDate] =
summon[TaggedSchema[String]].emap { raw =>
Result:
try LocalDate.parse(raw)
catch case _: java.time.format.DateTimeParseException =>
raise(DecodeError.Custom(s"Invalid ISO date '$raw'"))
}Note: for case class with no fields or a case object, the payload should be a named tuple with a single field - the class label, and null value. e.g.
case class Foo()will derive a decoder for(Foo = null)
Derivation also supports enums. Each case is represented as a single-field object where the field name is the case label. Cases with fields use a nested named tuple payload, and nullary cases use null.
import scalanotation.*
import steps.result.Result, Result.eval.raise
import java.time.LocalDate
enum Mode derives TaggedSchema:
case Fast
case Scheduled(at: LocalDate, retries: Int)
given TaggedSchema[LocalDate] =
summon[TaggedSchema[String]].emap { raw =>
Result:
try LocalDate.parse(raw)
catch case _: java.time.format.DateTimeParseException =>
raise(DecodeError.Custom(s"Invalid ISO date '$raw'"))
}
// Fast => (Fast = null)
// Scheduled => (Scheduled = (at = "2026-03-15", retries = 2))Typed decoding is strict:
- the requested root declaration name must match
- named tuple field order must match the target type
- field count and field names must match exactly
- decode errors include nested path information such as
.items[0].value - token-based parsing errors include line and column information
i.e. why no
val data = Foo(23)andval data = Barin a source file?
It is still possible to derive decoders automatically for these types, just the syntax may be awkward for enums,
and singleton objects (i.e. a discriminator is needed) (e.g. (Bar = null) and (Foo = (id = 23))).
The envisioned use case is to read a valid Scala file with no imports needed, i.e. to encode raw data with the syntax of Scala. Permitting references to external classes would mean we can no longer copy-paste the raw data into any scala file, (ignoring import overrides) now we need to resolve external references!
Also opening the config file would render the presentation compiler useless, so we would be required to build a more specialised tool that expects unknown references.
How would a generic tool such as a "jq for Scala Object Notation" work to traverse such objects
without type information?
Likely Foo(23) would be illegal as it could be ambiguous with sequence syntax,
and therefore only Foo(id = 23) allowed i.e. require named-arguments,
and then it can be a "labelled object literal".
Perhaps these could be traversed by a new notion of path, instead of just .name for a field, and
[i] for an index, perhaps #Foo to "cast down" to that type, and nothing is needed for Bar as
it is an opaque reference.
It becomes tougher to understand the meaning of the document
because now it is harder to preserve an illusion of "simple data",
and we must now ask "what version of Foo is it?" what fields is it expected to have? is subtyping meaningful?
And for a standalone identifier Bar, it can only be treated as a reference of unknown type until decoding.
At least with decoding a literal like (Foo = (id = 23)) into Foo(23) then it is more honest that
it is unversioned, and now the document itself should carry that information.
Another potential enabler could be to support explicit schemas to encode such definitions, but then they have to be included or referenced somehow in the document (again requiring tooling support).
The demo module provides a small CLI entry point:
./mill demo.run example/config.scala --name confAvailable options:
--name <value>: required root declaration name--tokens: print the token stream before parsing--json: render the parsed value as JSON--yaml: render the parsed value as YAML--safe-nums: preserve lossy JSON numeric cases as strings where relevant
Examples:
./mill demo.run example/config.scala --name conf --tokens
./mill demo.run example/config.scala --name conf --json
./mill demo.run example/config.scala --name conf --yaml
./mill demo.run example/config.scala --name conf --json --safe-numsThe current project state is:
core: tokenizer, AST model, parser, schema validation, and typed decoding into Scala 3 named tuplesdemo: a CLI that reads a config-like Scala file, optionally prints tokens, and can render the parsed value as JSON or YAMLexample/config.scala: a minimal input file used by the demo
The code currently lives in the scalanotation package.
This project uses Mill.
Run the full test suite:
./mill __.testCompile the core module:
./mill core.compileCompile the demo module:
./mill demo.compileThe test suite currently covers:
- AST parsing for nested tuples and vectors
- booleans, negative numbers, binary and hexadecimal literals
- line comments and nested block comments
- typed decoding into nested named tuples
- vector decoding for structured values
- root-name validation and field-order validation
- runtime decode error reporting with path and source location information
- compile-time schema derivation failures for unsupported target types
- demo JSON and YAML rendering