gramma

A Scala 3 parsing library with the ergonomics of scala-parser-combinators and performance close to fastparse. Supports separate lexing and parsing phases with automatic tokenization, and optional indentation-sensitive parsing.

Quick Start

import io.github.edadma.gramma.*

// 1. Define your lexer — just declare keywords and delimiters
object MyLexer extends StdLexer:
  delimiters ++= List("(", ")", "+", "-", "*", "/", ",")
  reserved ++= List("if", "then", "else")

// 2. Define your parser — ident, stringLit, numericLit are built in.
//    Bare strings automatically match keywords or delimiters.
object MyParser extends StdParsers(MyLexer):
  import scala.language.implicitConversions

  def expr(using ctx: ParseCtx): P[Int] =
    numericLit ^^ (_.toInt) |
      "if" ~> expr ~ ("then" ~> expr) ~ ("else" ~> expr) ^^ {
        case cond ~ t ~ e => if cond != 0 then t else e
      }

// 3. Parse
MyParser.parseSource("if 1 then 42 else 0")(MyParser.expr)
// Right(42)

No token types to define, no lexer rules to write. Identifiers, strings, numbers, keywords, and delimiters are handled automatically. Bare strings in parser rules match keywords or delimiters implicitly.

Installation

libraryDependencies += "io.github.edadma" %%% "gramma" % "0.0.2"

Cross-compiled for JVM, Scala.js, and Scala Native.

JSON Parser Example

A complete JSON parser in ~25 lines:

import io.github.edadma.gramma.*

sealed trait JValue
case class JObject(fields: List[(String, JValue)]) extends JValue
case class JArray(elements: List[JValue]) extends JValue
case class JString(value: String) extends JValue
case class JNumber(value: Double) extends JValue
case class JBool(value: Boolean) extends JValue
case object JNull extends JValue

object JSONLexer extends StdLexer:
  delimiters ++= List("{", "}", "[", "]", ":", ",")
  reserved ++= List("true", "false", "null")

object JSONParser extends StdParsers(JSONLexer):
  import scala.language.implicitConversions

  def value(using ctx: ParseCtx): P[JValue] =
    obj | arr | stringLit ^^ (JString(_)) |
      numericLit ^^ (n => JNumber(n.toDouble)) |
      "true" ^^ (_ => JBool(true)) |
      "false" ^^ (_ => JBool(false)) |
      "null" ^^ (_ => JNull)

  def arr(using ctx: ParseCtx): P[JValue] =
    "[" ~> repsep(value, ",") <~ "]" ^^ (JArray(_))

  def obj(using ctx: ParseCtx): P[JValue] =
    "{" ~> repsep(field, ",") <~ "}" ^^ (JObject(_))

  def field(using ctx: ParseCtx): P[(String, JValue)] =
    stringLit ~ (":" ~> value) ^^ { case k ~ v => (k, v) }

// Parse
JSONParser.parseSource("""{"name": "Alice", "age": 30}""")(JSONParser.value)

Indentation-Sensitive Parsing

Gramma supports Python-style indentation parsing. Enable it on your lexer and the library automatically emits Indent, Dedent, and Newline tokens:

object YAMLLexer extends StdLexer:
  override protected def indentSensitive: Boolean = true
  delimiters ++= List(":", "-")
  reserved ++= List("true", "false", "null")

object YAMLParser extends StdParsers(YAMLLexer):
  import scala.language.implicitConversions

  def document(using ctx: ParseCtx): P[YValue] =
    blockMap | blockList | scalar

  def blockMap(using ctx: ParseCtx): P[YValue] =
    rep1sep(mapEntry, newline) ^^ (entries => YMap(entries))

  def mapEntry(using ctx: ParseCtx): P[(String, YValue)] =
    ident ~ (":" ~> mapValue) ^^ { case k ~ v => (k, v) }

  def mapValue(using ctx: ParseCtx): P[YValue] =
    scalar | block(value)  // inline value or indented block

  def blockList(using ctx: ParseCtx): P[YValue] =
    rep1sep("-" ~> value, newline) ^^ (items => YList(items))

This parses:

name: Alice
address:
  city: Springfield
  zip: 12345
hobbies:
  - reading
  - coding

Indentation rules follow Python semantics:

Brackets ((), [], {}) suppress indentation — newlines inside brackets are ignored
Blank lines and comment-only lines are ignored
Inconsistent dedentation is an error

Performance

Benchmarked against fastparse and scala-parser-combinators parsing JSON on all three platforms. Times in microseconds (lower is better):

JVM

Input	gramma	fastparse	scala-combinators
Small (44 chars)	1.6 µs	0.4 µs	25 µs
Medium (1.6K chars)	13 µs	9 µs	671 µs
Large (11K chars)	96 µs	72 µs	72 µs

Scala Native (releaseFast)

Input	gramma	fastparse	scala-combinators
Small (44 chars)	2.4 µs	0.7 µs	65 µs
Medium (1.6K chars)	61 µs	18 µs	1,650 µs
Large (11K chars)	463 µs	148 µs	170 µs

Scala.js (Node/V8)

Input	gramma	fastparse	scala-combinators
Small (44 chars)	3.0 µs	1.3 µs	54 µs
Medium (1.6K chars)	83 µs	31 µs	1,498 µs
Large (11K chars)	627 µs	278 µs	206 µs

On small/medium inputs (typical source files), gramma is 15-50x faster than scala-combinators across all platforms. Fastparse is faster due to macro-generated code. Gramma's design focuses on separate lexing/parsing phases with proper token types and source positions.

A 1.6K source file parses in 13 µs on JVM, 61 µs on Native, 83 µs on JS.

Combinators

Combinator	Type	Description
`p ~ q`	`P[A ~ B]`	Sequence, returns pair for pattern matching
`p ~> q`	`P[B]`	Sequence, discard left
`p <~ q`	`P[A]`	Sequence, discard right
`p \| q`	`P[B]`	Alternation (committed choice)
`p ^^ f`	`P[B]`	Map result
`rep(p)`	`P[List[A]]`	Zero or more
`rep1(p)`	`P[List[A]]`	One or more
`repN(n, p)`	`P[List[A]]`	Exactly N repetitions
`repsep(p, sep)`	`P[List[A]]`	Zero or more with separator
`rep1sep(p, sep)`	`P[List[A]]`	One or more with separator
`opt(p)`	`P[Option[A]]`	Optional
`peek(p)`	`Boolean`	Lookahead without consuming
`not(p)`	`Unit`	Negative lookahead
`leftAssoc(p, op)(f)`	`P[A]`	Left-associative binary expressions
`positioned(p)`	`P[A]`	Stamp AST node with source position
`log(p, name)`	`P[A]`	Debug tracing (prints entry/exit)

StdLexer

StdLexer provides automatic tokenization. Declare your keywords and delimiters; identifiers, strings, and numbers are recognized automatically.

object MyLexer extends StdLexer:
  delimiters ++= List("<=", ">=", "==", "!=", "<", ">", "=", "+", "-", "*", "/")
  reserved ++= List("if", "then", "else", "true", "false")

Multi-character delimiters are matched longest-first (<= before <).

Customization

Override methods to customize lexer behavior:

Override	Default	Description
`isIdentStart(c)`	letter or `_`	First character of identifiers
`isIdentPart(c)`	letter, digit, or `_`	Subsequent identifier characters
`stringQuote`	`'"'`	String literal quote character
`stringEscape`	`'\\'`	String literal escape character
`skip`	whitespace	Whitespace/comment skipping
`indentSensitive`	`false`	Enable INDENT/DEDENT tokens
`customToken`	`None`	Hook for custom token types

StdParsers

StdParsers pairs with a StdLexer and provides built-in token matchers:

Method	Returns	Description
`ident`	`P[String]`	Match identifier, return text
`stringLit`	`P[String]`	Match string literal, return content
`numericLit`	`P[String]`	Match numeric literal, return text
`keyword(word)`	`P[String]`	Match specific keyword
`delimiter(d)`	`P[String]`	Match specific delimiter
`indent`	`P[Unit]`	Match indent token
`dedent`	`P[Unit]`	Match dedent token
`newline`	`P[Unit]`	Match newline token (same indent level)
`block(p)`	`P[A]`	Match `indent ~> p <~ dedent`

Bare strings are implicitly converted to keyword or delimiter matchers (requires import scala.language.implicitConversions):

// These are equivalent:
keyword("if") ~> expr ~ (keyword("then") ~> expr)
"if" ~> expr ~ ("then" ~> expr)

Committed Choice

Alternation uses committed choice: if the left branch of | consumes any input before failing, the right branch is never tried. No explicit cut operator is needed — commitment happens automatically at the point of token consumption.

Error Reporting

Errors use the furthest failure heuristic. Every token carries a Pos from lex time, so error messages include the full source line with a caret:

3:15: expected ')'
  foo(bar, baz
              ^

Advanced: Custom Lexers

For languages that need non-standard tokenization, use Lexers and TokenParsers directly:

case class Token(kind: TokenKind, text: String, pos: Pos)

object MyLexer extends Lexers:
  def nextToken(using ctx: LexCtx): P[Token] =
    skipWhitespace("//", "/*", "*/", false)
    val pos = ctx.capturePos()
    firstChar {
      case c if c.isLetter => identifier(_.isLetter, _.isLetterOrDigit) ^^ { text => ... }
      case c if c.isDigit  => decimalLit ^^ { text => ... }
      case '"'             => stringLit('"', '\\') ^^ { text => ... }
      case _               => charIn("+-*/") ^^ { c => ... }
    }

object MyParser extends TokenParsers[Token]:
  def tokenPos(token: Token): Pos = token.pos
  // ... define accept-based matchers for your token type

Building

sbt compile              # All platforms
sbt grammaJVM/compile    # JVM only
sbt grammaJS/compile     # Scala.js only
sbt grammaNative/compile # Scala Native only
sbt grammaJVM/test       # Run tests
sbt grammaJVM/run        # Run cross-platform benchmarks

License

ISC License — see LICENSE for details.

edadma / gramma 0.0.3