Parserz is a purely-functional library for creating parsers, pretty-printers, and grammar definitions from a single, type-safe specification of a grammar.
The main idea of the library is to facilitate conversions between types A
and B
in both directions via single coherent implementation,
as in contrast of having two separate implementations for A => B
and B => A
.
The theory behind this idea is outlined in the paper Invertible Syntax Descriptions but the library does not exactly follow the implementation proposed in paper.
The following are the abstractions provided by library
trait ParsersModule {
type Input
sealed abstract class Grammar[-SI, +SO, +E, A] {}
}
Grammar
is a data type for modelling user-defined grammars and
ParsersModule
is a module that contains interpreters of Grammar
into runnable functions, i.e. parsers and printers, etc.
Internally library follows the "free encoding" design where Grammar
is defined as a generalized algebraic data type (GADT),
and all interpreters are implemented as traversals of Grammar
.
Types:
Input
- the input type; values of this type are consumed by parser and produced by printerA
- the output type; values of this type are produced by parser and consumed by printerE
- the error type; values of this type are produced by both parser and printer if they cannot proceed normallySI
- the input state type andSO
- the output state type; values of these types represent state that is threaded through the execution of parser and printer
Note: for the printer or other interpreters, meaning of Input
and A
may be completely reversed.
Unfortunately, Scala does not allow redefining type names based on one's interpretation,
so we have chosen the parser direction to be the titular one simply because library name is parserz
.
Simply add
libraryDependencies += "org.spartanz" %% "parserz" % "0.2.0"
to the SBT build.
Instantiate the module
import org.spartanz.parserz.ParsersModule
object MyParser extends ParsersModule {
override type Input = List[Char]
}
which also requires defining the input type.
Import combinators
import MyParser.Grammar._
import MyParser._
It is now possible to create grammars!
Let's create building blocks to parse the following program
"a(bx(),cy(z()))"
into description on how it can be invoked
case class Call(name: ::[Char], args: List[Call])
Full working example is here
consume
combinator and its variants are used to take a portion of input
and return remainder of the input and the consumed value. This is how it's defined:
final def consume[E, A](to: Input => E \/ (Input, A), from: ((Input, A)) => E \/ Input): Grammar[Any, Nothing, E, A]
It requires two functions to be provided both returning either a value or an error, for example
val char: G[Char] = consume({
case c :: cs => Right(cs -> c)
case Nil => Left("eoi")
}, {
case (cs, c) => Right(c :: cs)
})
The G[A]
is simply a type alias for Grammar[Any, Nothing, String, A]
, which means
- state is not used (accepts any state and returns nothing)
- error type is set to
String
- the value taken by this grammar is of type
List[Char]
(as defined above) and produced value is of typeA
There are combinators that allow to assert the value (and return error if assertion failed),
for example filter
and its variants, which is defined as
final def filter[E1 >: E](e: E1)(f: A => Boolean): Grammar[SI, SO, E1, A]
It accepts a predicate function which is applied to the result of existing grammar. For example to test that consumed character is something specific we can do
val alpha: G[Char] = char.filter("expected: alphabetical")(_.isLetter)
val comma: G[Char] = char.filter("expected: comma")(_ == ',')
Resulting grammars are new bigger building blocks.
~
orzip
combines two grammars that produceA
andB
into a grammar that producesTuple2[A, B]
|
oralt
combines two grammars that produceA
andB
into a grammar that producesEither[A, B]
rep
repeats the grammar zero or more times while it can andrep1
repeats the grammar one or more times while it can
In the program "a(bx(),cy(z()))"
, "bx(),cy(z())"
is the list of arguments of function a
.
Here is how to parse them:
it's a call to another function followed by zero or more calls to other functions prefixed by comma, or no calls at all:
val args: G[List[Call]] = ((call ~ (comma ~ call).rep) | succeed(Nil)).map({
case Left((e1, en)) => e1 :: en.map(_._2)
case Right(_) => Nil
}, {
case Nil => Right(Nil)
case e1 :: en => Left((e1, en.map((',', _))))
})
In the previous example there is a call to call
grammar that produces a value of Call
.
Call is also a recursive data structure, as it references itself:
case class Call(name: ::[Char], args: List[Call])
To allow recursion in grammar definitions, current version of library has the delay
combinator. Here is how
grammar for Call
is defined:
lazy val call: G[Call] = delay {
(alpha.rep1 ~ paren1 ~ args ~ paren2).map(
{ case (((name, _), exp), _) => Call(name, exp) },
{ case Call(name, exp) => (((name, '('), exp), ')') }
)
}
Notice the lazy
modifier as well which allows this grammar to be forward-referenced.
All grammars are now constructed and can be passed to interpreters.
The parser is simply a function constructed by the library from the given grammar:
val parser: (Unit, List[Char]) => (Unit, String \/ (List[Char], Call)) = MyParser.parser(call)
Passing program "a(bx(),cy(z()))"
into this function (along with state of Unit
) yields
Call(::('a', Nil), List(
Call(::('b', List('x')), Nil),
Call(::('c', List('y')), List(
Call(::('z', Nil), Nil)
))
))
which is indeed the structure representing function calls in the program.
The printer is also a function constructed by the library from the given grammar:
val printer: (Unit, (List[Char], Call)) => (Unit, String \/ List[Char]) = MyParser.printer(call)
Printing the value just received from the parser yields the program text back:
printer((), (Nil, value))._2.map(_.reverse.mkString)
Here Nil
is initial output, which is, of course, empty,
and there is a reverse
done on the output because of the consume
implementation above which prepends to list.
There is an interpreter that produces a description of the grammar in the Backus–Naur form (BNF).
val description: List[String] = MyParser.bnf(call)
For it to work, grammars can be described with the tag
combinator (or its symbolic equivalent @@
):
val char: G[Char] = "char" @@ consume(...)
val alpha: G[Char] = char.filter("expected: alphabetical")(_.isLetter).tag("alpha")
It interprets directly into a value, which is a printout of all tagged grammars. This is how it looks like for the example above
<alpha> ::= <char>
<open paren> ::= <char>
<comma> ::= <char>
<args> ::= (<call> List(<comma> <call>) | )
<close paren> ::= <char>
<call> ::= NEL(<alpha>) <open paren> <args> <close paren>