A Scala library for indentation-sensitive lexical analysis using parser combinators. Extends StdLexical to automatically generate INDENT, DEDENT, and NEWLINE tokens for Python-style block structure.
libraryDependencies += "io.github.edadma" %%% "indentation" % "0.0.2"Cross-compiled for JVM, Scala.js, and Scala Native.
import io.github.edadma.indentation.IndentationLexical
import scala.util.parsing.combinator.syntactical.StandardTokenParsers
import scala.util.parsing.input.CharSequenceReader
class MyParser extends StandardTokenParsers {
override val lexical: IndentationLexical = new IndentationLexical(
newlineBeforeIndent = true,
newlineAfterDedent = true,
startLineJoining = List("(", "["),
endLineJoining = List(")", "]"),
lineComment = "//",
blockCommentStart = "/*",
blockCommentEnd = "*/"
) {
reserved ++= List("if", "else", "then", "print")
delimiters ++= List("=", "+", "-", "*", "/", "<", ">", "(", ")")
}
import lexical.{Newline, Indent, Dedent}
lazy val program: Parser[List[String]] =
repsep(statement, rep1(Newline)) <~ opt(rep(Newline))
lazy val statement: Parser[String] =
"if" ~> ident ~ ("then" ~> block) ^^ { case id ~ _ => s"if($id)" } |
"print" ~> ident ^^ (id => s"print($id)") |
ident ~ ("=" ~> ident) ^^ { case a ~ b => s"$a=$b" }
lazy val block: Parser[List[String]] =
Newline ~> Indent ~> repsep(statement, rep1(Newline)) <~ opt(Newline) <~ Dedent
def parse(input: String) =
phrase(program)(lexical.read(new CharSequenceReader(input)))
}Usage:
val parser = new MyParser
parser.parse("x = y\nif x then\n print x\nprint done")
// Success(List(x=y, if(x), print(done)), ...)Note: create a new parser instance per parse call, as StandardTokenParsers has internal mutable state.
| Parameter | Description |
|---|---|
newlineBeforeIndent |
Emit a Newline token before each Indent token |
newlineAfterDedent |
Emit a Newline token after each Dedent token |
startLineJoining |
Tokens that suppress indentation (e.g., "(", "[") |
endLineJoining |
Tokens that end line joining (e.g., ")", "]") |
lineComment |
Line comment prefix (e.g., "//", "#") |
blockCommentStart |
Block comment start delimiter (e.g., "/*") |
blockCommentEnd |
Block comment end delimiter (e.g., "*/") |
The lexer generates three special tokens in addition to the standard StdLexical tokens:
Newline-- end of a logical line (same indentation level)Indent-- indentation increasedDedent-- indentation decreased
Indents and dedents are always balanced. The lexer handles:
- Tab or space indentation (but not mixed on the same line)
- Blank lines and comment-only lines (skipped)
- Line joining inside brackets/parentheses
- Proper dedent generation at end of input
Expressions inside parentheses or brackets can span multiple lines without generating indentation tokens:
result = (1 +
2 +
3)
Configure which tokens trigger line joining via startLineJoining and endLineJoining.
sbt compile # All platforms
sbt indentationJVM/compile # JVM only
sbt indentationJVM/test # Run testsISC License -- see LICENSE for details.