A very basic backtracking pattern recognizer implemented in Scala. It provides a lightweight, composable DSL for parsing streams of input with support for backtracking, cut points, and capture transformations.
Complete API documentation is available at: https://edadma.github.io/recognizer/
- Cross-platform: Supports JVM, Scala.js, and Scala Native via a cross-project setup
- Composable DSL: Build complex patterns using sequence (
~
), alternation (|
), repetition (rep
,rep1
), optional (opt
), negation (not
), and more - Backtracking & Cut: Fine-grained control over backtracking with the cut operator (
!!
) - Value Capture & Transform: Capture matched input positions or values and apply custom transformations
- Built-in Patterns: Common patterns for letters, digits, identifiers, whitespace, links, images, etc.
- Lightweight: No external dependencies beyond Scala standard library and ScalaTest for testing
Add the following to your build.sbt
:
libraryDependencies += "io.github.edadma" %%% "recognizer" % "0.0.3"
For cross-platform projects, ensure your project/plugins.sbt
includes:
addSbtPlugin("org.portable-scala" % "sbt-scalajs-crossproject" % "1.3.2")
addSbtPlugin("org.portable-scala" % "sbt-scala-native-crossproject" % "1.3.2")
addSbtPlugin("org.scala-js" % "sbt-scalajs" % "1.19.0")
addSbtPlugin("org.scala-native" % "sbt-scala-native" % "0.5.7")
Create a simple parser by mixing in the CharRecognizer
and Testing
traits:
import io.github.edadma.recognizer._
object Example extends App with CharRecognizer[Char] with Testing {
// Define a simple calculator for addition and subtraction
lazy val expr: Pattern = term ~ rep(('+' | '-') ~ term ~ action3[Any, Char, Any] {
case (a, '+', b) => a.asInstanceOf[Int] + b.asInstanceOf[Int]
case (a, '-', b) => a.asInstanceOf[Int] - b.asInstanceOf[Int]
})
lazy val term: Pattern = digits ~ action[String](_.toInt)
// Test it
val input = "123+45-67"
parse(input, expr) match {
case Some((Some(result), rest)) =>
println(s"Result: $result, Remaining: '$rest'")
case None =>
println("Parsing failed")
}
// Output: Result: 101, Remaining: ''
}
Combinator | Description | Example |
---|---|---|
p ~ q |
Sequence: match p then q |
digit ~ letter matches "5a" |
`p | q` | Alternation: match p or q |
rep(p) |
Zero-or-more repetitions of p |
rep(digit) matches "123" or "" |
rep1(p) |
One-or-more repetitions of p |
rep1(digit) matches "123" but not "" |
opt(p) |
Optional match of p |
opt('-') ~ digit matches "-5" or "5" |
not(p) |
Negative lookahead: succeed only if p fails |
letter ~ not(digit) matches "a" but not "a5" |
capture(p)(f) |
Capture matched input for pattern p |
capture(rep1(digit))((i, e) => i.listElem(e).mkString.toInt) |
!! |
Cut: disallow backtracking past this point | 'a' ~ !! ~ 'b' will fail on "ac" without trying alternatives |
string(p) |
Shortcut to capture characters as string | string(rep1(digit)) captures digits as string |
import io.github.edadma.recognizer._
object CSVExample extends App with CharRecognizer[Char] with Testing {
// Define CSV patterns
val escapedField = '"' ~ capture(rep(noneOf('"') | "\"\"" ~ action[String](_ => "\"")))((i, e) =>
i.listElem(e).mkString) ~ '"'
val rawField = string(rep(noneOf(',', '\n', '\r')))
val field = escapedField | rawField
val record = field ~ rep(',' ~ field) ~ action[List[Any]](fields => fields)
// Parse a CSV line
val input = """simple,123,"quoted,field","escaped""quote" """
parse(input, record) match {
case Some((Some(result), rest)) =>
println(s"Fields: ${result.asInstanceOf[List[Any]].mkString(", ")}")
println(s"Remaining: '$rest'")
case None =>
println("Parsing failed")
}
}
The library is powerful enough to build parsers for common formats like JSON. See the documentation site for complete examples.
Recognizer[W, E]
: Core trait providing pattern combinators over input typeI = Input[W, E]
.CharRecognizer[W]
: Specialization for character-based inputs, with helpers likedigit
,alpha
,ident
,kw
,sym
, etc.Input[W, E]
: Represents a stream of elementsE
with wrapped valuesW
; includes helpers to collect rest of input.StringInput
: Implementation ofInput
for string parsing.Testing
: Mixin providing a convenientparse
method for quick tests and REPL usage.
For full details, refer to the API documentation.
The library returns None
when parsing fails. For more detailed error reporting, you can implement your own error handling using combinators like pointer
and action
:
def withErrorLocation(p: Pattern): Pattern =
pointer ~ p ~ action[Any](result => result) |
pointer ~ action[Input[Char, Char]](input =>
throw new RuntimeException(s"Parse error at position ${input.listElem(StringInput("", 0)).length}")
)
Unit tests are written with ScalaTest. Run them with:
sbt test
Contributions are welcome! To contribute:
- Fork the repository
- Create a feature branch:
git checkout -b feature/YourFeature
- Commit your changes and push to your fork
- Open a Pull Request against the
main
branch
Please follow the existing code style and include tests for new features.
This project is licensed under the ISC License. See the LICENSE file for details.