Macro PEG extends Parsing Expression Grammars with macro-like rules and is implemented in Scala 3. It supports lambda-style macros so you can build higher-order grammars.
Whitespace is omitted in the grammar below.
Grammar <- Definition* ";"
Definition <- Identifier ("(" Arg ("," Arg)* ")")? "=" Expression ";"
Arg <- Identifier (":" Type)?
Type <- RuleType / "?"
RuleType <- ("(" Type ("," Type)* ")" "->" Type)
/ (Type "->" Type)
Expression <- Sequence ("/" Sequence)*
Sequence <- Prefix+
Prefix <- ("&" / "!") Suffix
/ Suffix
Suffix <- Primary "?"
/ Primary "*"
/ Primary "+"
/ Primary
Primary <- "(" Expression ")"
/ Call
/ Debug
/ Identifier
/ StringLiteral
/ CharacterClass
/ Lambda
Call <- Identifier "(" Expression ("," Expression)* ")"
Debug <- "Debug" "(" Expression ")"
Lambda <- "(" Identifier ("," Identifier)* "->" Expression ")"
StringLiteral <- '"' (!'"' .)* '"'
CharacterClass<- '[' '^'? (!']' .)+ ']'
- Macro rules with parameters
- Lambda macros for higher-order grammars
- Type annotations for macro parameters
- Multiple evaluation strategies (call by name, call by value sequential/parallel)
- Parser combinator library
MacroParsers - Scala 3 inline macro API
InlineMacroParsers.mpeg(compile-time grammar validation, strategy selection) - Rich diagnostics via
Diagnostic(parse,well-formedness,type-check,evaluation,generation) - Static grammar validation (
GrammarValidator) for undefined references, nullable repetition, and left recursion - Packrat-style memoization in evaluator (
evaluateWithDiagnostics) - Parser generator backend (
codegen.ParserGenerator) for first-order grammars, with interpreter-backed fallback for higher-order grammars - Combinator ergonomics:
label,cut,recover,trace, and formatted failures - Debug expressions for inspecting matches
- Experimental Ruby prototype (
ruby.RubyFullParserbootstrap,ruby.RubySubsetParsercompatibility) with AST nodes (ruby.RubyAst)
Add the library to your build.sbt:
libraryDependencies += "com.github.kmizu" %% "macro_peg" % "0.1.1-SNAPSHOT"Then parse and evaluate a grammar:
import com.github.kmizu.macro_peg._
val grammar = Parser.parse("""
S = Double((x -> x x), "aa") !.;
Double(f: ?, s: ?) = f(f(s));
""")
val evaluator = Evaluator(grammar)
val result = evaluator.evaluate("aaaaaaaa", Symbol("S"))
println(result)For typed diagnostics and safe construction:
val interpreterEither = Interpreter.fromSourceEither("""S = "ab";""")
val resultEither = interpreterEither.flatMap(_.evaluateEither("ac"))For compile-time checked grammar (Scala 3 inline macro):
import com.github.kmizu.macro_peg.InlineMacroParsers._
import com.github.kmizu.macro_peg.EvaluationStrategy
val parser = mpeg("""S = "ab" !.;""")
assert(parser.accepts("ab"))
// Useful for dynamic delimiter capture patterns (scannerless, no external lexer state).
val parser2 = mpeg(
"""S = F("<<", [A-Z]+, "\n") !.; F(Open, Delim, NL) = Delim;""",
strategy = EvaluationStrategy.CallByValueSeq
)For generated parser source code from a first-order grammar:
import com.github.kmizu.macro_peg.codegen.ParserGenerator
val source = ParserGenerator.generateFromSource("""S = "a" "b";""")For a Ruby-oriented AST parsing prototype:
import com.github.kmizu.macro_peg.ruby.RubySubsetParser
val astEither = RubySubsetParser.parse("""class User; def greet(name); "hi"; end; end""")import com.github.kmizu.macro_peg.ruby.RubyFullParser
val astEither = RubyFullParser.parse("""module M; if flag; :ok; end; end""")Current prototype coverage includes class/module/def (including class superclass headers like class C < Base, singleton class class << self, and punctuated method names like empty?), arrays/hashes (including label style entries like {foo: 1} and multiline comma-separated elements), symbols (including variable-like symbols such as :$a, :@x, :@@y and forwarding markers :*/:**/:&), if/elsif/else, unless, while, until, for ... in ... end, begin/rescue/ensure (+ retry), postfix modifiers (stmt if cond / stmt unless cond), return, self, instance/class/global variables (@x, @@x, $x), constant-path references (A::B), single/percent-quoted string literals ('x', %q{...}, %Q{...}, %{...} with nested paired delimiters), percent word arrays (%w[...], %W[...]), regex literals (/.../, %r{...}, %r"..."), command-style no-parentheses calls (puts :ok, add 1, 2), call keyword arguments (f(x: 1) / f x: 1 and multiline parenthesized arg lists), dot-call chains (including no-arg links like user.profile.name), bracket index calls (ENV["HOME"]), range expressions (1..5 / 1...5), comparison/logical/unary/match operators (==, !=, =~, !~, <, >, &&, ||, !, +, -, and, or), assignment expression in conditions (while (x = f())), squiggly heredoc arguments (<<~TAG), Ruby block comments (=begin ... =end), -x style script preamble stripping, block-pass params/args (&block), call-attached blocks (do/end, {}), and newline-separated statements.
To run Ruby upstream .rb corpus files against the current parser:
mkdir -p third_party/ruby3/upstream
git clone --depth 1 --filter=blob:none --sparse https://github.com/ruby/ruby.git third_party/ruby3/upstream/ruby
cd third_party/ruby3/upstream/ruby
git sparse-checkout set test/ruby bootstraptest test/prism
cd ../../..
sbt "runMain com.github.kmizu.macro_peg.ruby.RubyCorpusRunner"Optional environment variables:
RUBY_CORPUS_TIMEOUT_MS(default:1000)RUBY_CORPUS_FAIL_SAMPLES(default:20)RUBY_CORPUS_FULL_ERROR(1to print full formatted failures, default: first line only)
- Introduce backreference as
evalCCmethod. - pfun -> delayedParser, which is better naming than before(breaking change)
- More accurate ParseException
- EvaluationException is thrown when arity of function is not equal to passed params.
- Improved Parser
Execute the following command:
sbt testThis project is released under the MIT License.