macro-peg / macro_peg   0.0.3

BSD 3-clause "New" or "Revised" License GitHub

Macro PEG: PEG with macro-like rules

Scala versions: 2.11

Macro PEG

Macro PEG extends Parsing Expression Grammars with macro-like rules and is implemented in Scala 3. It supports lambda-style macros so you can build higher-order grammars.

Grammar Overview

Whitespace is omitted in the grammar below.

Grammar       <- Definition* ";"
Definition    <- Identifier ("(" Arg ("," Arg)* ")")? "=" Expression ";"
Arg           <- Identifier (":" Type)?
Type          <- RuleType / "?"
RuleType      <- ("(" Type ("," Type)* ")" "->" Type)
               / (Type "->" Type)
Expression    <- Sequence ("/" Sequence)*
Sequence      <- Prefix+
Prefix        <- ("&" / "!") Suffix
               / Suffix
Suffix        <- Primary "?"
               / Primary "*"
               / Primary "+"
               / Primary
Primary       <- "(" Expression ")"
               / Call
               / Debug
               / Identifier
               / StringLiteral
               / CharacterClass
               / Lambda
Call          <- Identifier "(" Expression ("," Expression)* ")"
Debug         <- "Debug" "(" Expression ")"
Lambda        <- "(" Identifier ("," Identifier)* "->" Expression ")"
StringLiteral <- '"' (!'"' .)* '"'
CharacterClass<- '[' '^'? (!']' .)+ ']'

Features

  • Macro rules with parameters
  • Lambda macros for higher-order grammars
  • Type annotations for macro parameters
  • Multiple evaluation strategies (call by name, call by value sequential/parallel)
  • Parser combinator library MacroParsers
  • Scala 3 inline macro API InlineMacroParsers.mpeg (compile-time grammar validation, strategy selection)
  • Rich diagnostics via Diagnostic (parse, well-formedness, type-check, evaluation, generation)
  • Static grammar validation (GrammarValidator) for undefined references, nullable repetition, and left recursion
  • Packrat-style memoization in evaluator (evaluateWithDiagnostics)
  • Parser generator backend (codegen.ParserGenerator) for first-order grammars, with interpreter-backed fallback for higher-order grammars
  • Combinator ergonomics: label, cut, recover, trace, and formatted failures
  • Debug expressions for inspecting matches
  • Experimental Ruby prototype (ruby.RubyFullParser bootstrap, ruby.RubySubsetParser compatibility) with AST nodes (ruby.RubyAst)

Getting Started

Add the library to your build.sbt:

libraryDependencies += "com.github.kmizu" %% "macro_peg" % "0.1.1-SNAPSHOT"

Then parse and evaluate a grammar:

import com.github.kmizu.macro_peg._

val grammar = Parser.parse("""
  S = Double((x -> x x), "aa") !.;
  Double(f: ?, s: ?) = f(f(s));
""")

val evaluator = Evaluator(grammar)
val result = evaluator.evaluate("aaaaaaaa", Symbol("S"))
println(result)

For typed diagnostics and safe construction:

val interpreterEither = Interpreter.fromSourceEither("""S = "ab";""")
val resultEither = interpreterEither.flatMap(_.evaluateEither("ac"))

For compile-time checked grammar (Scala 3 inline macro):

import com.github.kmizu.macro_peg.InlineMacroParsers._
import com.github.kmizu.macro_peg.EvaluationStrategy

val parser = mpeg("""S = "ab" !.;""")
assert(parser.accepts("ab"))

// Useful for dynamic delimiter capture patterns (scannerless, no external lexer state).
val parser2 = mpeg(
  """S = F("<<", [A-Z]+, "\n") !.; F(Open, Delim, NL) = Delim;""",
  strategy = EvaluationStrategy.CallByValueSeq
)

For generated parser source code from a first-order grammar:

import com.github.kmizu.macro_peg.codegen.ParserGenerator

val source = ParserGenerator.generateFromSource("""S = "a" "b";""")

For a Ruby-oriented AST parsing prototype:

import com.github.kmizu.macro_peg.ruby.RubySubsetParser

val astEither = RubySubsetParser.parse("""class User; def greet(name); "hi"; end; end""")
import com.github.kmizu.macro_peg.ruby.RubyFullParser

val astEither = RubyFullParser.parse("""module M; if flag; :ok; end; end""")

Current prototype coverage includes class/module/def (including class superclass headers like class C < Base, singleton class class << self, and punctuated method names like empty?), arrays/hashes (including label style entries like {foo: 1} and multiline comma-separated elements), symbols (including variable-like symbols such as :$a, :@x, :@@y and forwarding markers :*/:**/:&), if/elsif/else, unless, while, until, for ... in ... end, begin/rescue/ensure (+ retry), postfix modifiers (stmt if cond / stmt unless cond), return, self, instance/class/global variables (@x, @@x, $x), constant-path references (A::B), single/percent-quoted string literals ('x', %q{...}, %Q{...}, %{...} with nested paired delimiters), percent word arrays (%w[...], %W[...]), regex literals (/.../, %r{...}, %r"..."), command-style no-parentheses calls (puts :ok, add 1, 2), call keyword arguments (f(x: 1) / f x: 1 and multiline parenthesized arg lists), dot-call chains (including no-arg links like user.profile.name), bracket index calls (ENV["HOME"]), range expressions (1..5 / 1...5), comparison/logical/unary/match operators (==, !=, =~, !~, <, >, &&, ||, !, +, -, and, or), assignment expression in conditions (while (x = f())), squiggly heredoc arguments (<<~TAG), Ruby block comments (=begin ... =end), -x style script preamble stripping, block-pass params/args (&block), call-attached blocks (do/end, {}), and newline-separated statements.

To run Ruby upstream .rb corpus files against the current parser:

mkdir -p third_party/ruby3/upstream
git clone --depth 1 --filter=blob:none --sparse https://github.com/ruby/ruby.git third_party/ruby3/upstream/ruby
cd third_party/ruby3/upstream/ruby
git sparse-checkout set test/ruby bootstraptest test/prism
cd ../../..
sbt "runMain com.github.kmizu.macro_peg.ruby.RubyCorpusRunner"

Optional environment variables:

  • RUBY_CORPUS_TIMEOUT_MS (default: 1000)
  • RUBY_CORPUS_FAIL_SAMPLES (default: 20)
  • RUBY_CORPUS_FULL_ERROR (1 to print full formatted failures, default: first line only)

Release Note

0.0.9

0.0.8

0.0.7

0.0.6

0.0.5

0.0.4

0.0.3

0.0.2

Running Tests

Execute the following command:

sbt test

License

This project is released under the MIT License.