macro-peg / macro_peg   0.0.3

BSD 3-clause "New" or "Revised" License GitHub

Macro PEG: PEG with macro-like rules

Scala versions: 2.11

Macro PEG

Macro PEG extends Parsing Expression Grammars with macro-like rules and is implemented in Scala 3. It supports lambda-style macros so you can build higher-order grammars.

Grammar Overview

Whitespace is omitted in the grammar below.

Grammar       <- Definition* ";"
Definition    <- Identifier ("(" Arg ("," Arg)* ")")? "=" Expression ";"
Arg           <- Identifier (":" Type)?
Type          <- RuleType / "?"
RuleType      <- ("(" Type ("," Type)* ")" "->" Type)
               / (Type "->" Type)
Expression    <- Sequence ("/" Sequence)*
Sequence      <- Prefix+
Prefix        <- ("&" / "!") Suffix
               / Suffix
Suffix        <- Primary "?"
               / Primary "*"
               / Primary "+"
               / Primary
Primary       <- "(" Expression ")"
               / Call
               / Debug
               / Identifier
               / StringLiteral
               / CharacterClass
               / Lambda
Call          <- Identifier "(" Expression ("," Expression)* ")"
Debug         <- "Debug" "(" Expression ")"
Lambda        <- "(" Identifier ("," Identifier)* "->" Expression ")"
StringLiteral <- '"' (!'"' .)* '"'
CharacterClass<- '[' '^'? (!']' .)+ ']'

Features

  • Macro rules with parameters
  • Lambda macros for higher-order grammars
  • Type annotations for macro parameters
  • Multiple evaluation strategies (call by name, call by value sequential/parallel)
  • Parser combinator library MacroParsers
  • Scala 3 inline macro API InlineMacroParsers.mpeg (compile-time grammar validation, strategy selection)
  • Rich diagnostics via Diagnostic (parse, well-formedness, type-check, evaluation, generation)
  • Static grammar validation (GrammarValidator) for undefined references, nullable repetition, and left recursion
  • Packrat-style memoization in evaluator (evaluateWithDiagnostics)
  • Parser generator backend (codegen.ParserGenerator) for first-order grammars, with interpreter-backed fallback for higher-order grammars
  • Combinator ergonomics: label, cut, recover, trace, and formatted failures
  • Debug expressions for inspecting matches
  • Ruby parser (ruby.RubyParser) achieving 100% parse success on the upstream Ruby test corpus (302/302 files), with full AST (ruby.RubyAst)

Getting Started

Add the library to your build.sbt:

libraryDependencies += "com.github.kmizu" %% "macro_peg" % "0.1.1-SNAPSHOT"

Then parse and evaluate a grammar:

import com.github.kmizu.macro_peg._

val grammar = Parser.parse("""
  S = Double((x -> x x), "aa") !.;
  Double(f: ?, s: ?) = f(f(s));
""")

val evaluator = Evaluator(grammar)
val result = evaluator.evaluate("aaaaaaaa", Symbol("S"))
println(result)

For typed diagnostics and safe construction:

val interpreterEither = Interpreter.fromSourceEither("""S = "ab";""")
val resultEither = interpreterEither.flatMap(_.evaluateEither("ac"))

For compile-time checked grammar (Scala 3 inline macro):

import com.github.kmizu.macro_peg.InlineMacroParsers._
import com.github.kmizu.macro_peg.EvaluationStrategy

val parser = mpeg("""S = "ab" !.;""")
assert(parser.accepts("ab"))

// Useful for dynamic delimiter capture patterns (scannerless, no external lexer state).
val parser2 = mpeg(
  """S = F("<<", [A-Z]+, "\n") !.; F(Open, Delim, NL) = Delim;""",
  strategy = EvaluationStrategy.CallByValueSeq
)

For generated parser source code from a first-order grammar:

import com.github.kmizu.macro_peg.codegen.ParserGenerator

val source = ParserGenerator.generateFromSource("""S = "a" "b";""")

Language Parsers

Language Coverage Approach
Ruby 3.x 302/302 files Combinator (RubyParser, full AST) + Generated (GeneratedRubyParser, error reporting)
Python planned
Ruby

RubyParser — hand-written combinator parser

Full AST (ruby.RubyAst), covering classes, modules, methods, blocks, pattern matching (case/in), heredocs, string interpolation, regex, percent literals, operator precedence, assignment variants, and more.

import com.github.kmizu.macro_peg.ruby.RubyParser

val astEither = RubyParser.parse("""class User; def greet(name); "hi"; end; end""")

GeneratedRubyParser — generated from ruby.mpeg

Accepts/rejects Ruby source with structured error reporting (line:col + expected token + rule stack).

import com.github.kmizu.macro_peg.ruby.GeneratedRubyParser

GeneratedRubyParser.parseAll("x = 1") match {
  case Right(_)   => println("ok")
  case Left(msg)  => println(msg)  // e.g. "parse error at 1:5\nexpected: ..."
}

Corpus setup (one-time)

mkdir -p third_party/ruby3/upstream
git clone --depth 1 --filter=blob:none --sparse https://github.com/ruby/ruby.git third_party/ruby3/upstream/ruby
cd third_party/ruby3/upstream/ruby
git sparse-checkout set test/ruby bootstraptest test/prism

Running the corpus

# Combinator parser
sbt "runMain com.github.kmizu.macro_peg.ruby.RubyCorpusRunner"

# Generated parser
sbt "runMain com.github.kmizu.macro_peg.ruby.GeneratedRubyCorpusRunner"

Optional environment variables:

  • RUBY_CORPUS_TIMEOUT_MS (default: 5000)
  • RUBY_CORPUS_FAIL_SAMPLES (default: 20)
  • RUBY_CORPUS_FULL_ERROR (1 to print full formatted failures, default: first line only)
Python (planned)

Coming soon.

Release Note

0.0.9

0.0.8

0.0.7

0.0.6

0.0.5

0.0.4

0.0.3

0.0.2

Running Tests

Execute the following command:

sbt test

License

This project is released under the MIT License.