A Scala library for indentation-sensitive lexical analysis using parser combinators.
Indentation extends Scala's standard lexical analyzer to support Python-style indentation sensitivity. It automatically generates INDENT
, DEDENT
, and NEWLINE
tokens, making it easy to build parsers for languages that use indentation for block structure.
Perfect for creating domain-specific languages, configuration parsers, or any language where you want clean, readable syntax without explicit block delimiters.
🔹 Indentation-sensitive lexing - Automatic INDENT
/DEDENT
token generation
🔹 Line joining - Configurable bracket/parentheses handling across lines
🔹 Comment support - Built-in line and block comment processing
🔹 Cross-platform - Works on JVM, JavaScript, and Scala Native
🔹 Parser combinator integration - Extends StdLexical
seamlessly
🔹 Comprehensive testing - Well-tested with edge cases covered
Add to your build.sbt
:
libraryDependencies += "io.github.edadma" %%% "indentation" % "0.0.1"
For Maven:
<dependency>
<groupId>io.github.edadma</groupId>
<artifactId>indentation_3</artifactId>
<version>0.0.1</version>
</dependency>
import io.github.edadma.indentation.IndentationLexical
val lexical = new IndentationLexical(
newlineBeforeIndent = false,
newlineAfterDedent = true,
startLineJoining = List("(", "["),
endLineJoining = List(")", "]"),
lineComment = "//",
blockCommentStart = "/*",
blockCommentEnd = "*/"
) {
// Add your language-specific tokens
reserved ++= List("if", "else", "then")
delimiters ++= List("=", "+", "-", "(", ")")
}
// Scan some indented code
val tokens = lexical.scan("""
x = 1
if x > 0 then
print x
y = x + 1
print "done"
""")
// Results in: [Identifier("x"), Keyword("="), NumericLit("1"), Newline,
// Keyword("if"), Identifier("x"), Keyword(">"), NumericLit("0"),
// Keyword("then"), Newline, Indent, ...]
The IndentationLexical
constructor accepts:
newlineBeforeIndent
- Insert newline before indent tokensnewlineAfterDedent
- Insert newline after dedent tokensstartLineJoining
- Tokens that start line joining (e.g.,"("
,"["
)endLineJoining
- Tokens that end line joining (e.g.,")"
,"]"
)lineComment
- Line comment prefix (e.g.,"//"
,"#"
)blockCommentStart
- Block comment start delimiterblockCommentEnd
- Block comment end delimiter
Line joining allows expressions to span multiple lines within parentheses or brackets without generating indentation tokens:
// This works without indentation tokens
result = (1 + 2 +
3 + 4)
coordinates = [
x, y, z,
a, b, c
]
Here's a complete toy language parser using the indentation lexer:
import io.github.edadma.indentation.IndentationLexical
import scala.util.parsing.combinator.syntactical.StandardTokenParsers
// AST definitions
sealed trait Stmt
case class Assign(variable: String, value: Expr) extends Stmt
case class If(condition: Expr, thenBlock: List[Stmt], elseBlock: Option[List[Stmt]] = None) extends Stmt
case class Print(expr: Expr) extends Stmt
sealed trait Expr
case class Var(name: String) extends Expr
case class Num(value: Int) extends Expr
case class BinOp(left: Expr, op: String, right: Expr) extends Expr
object ToyParser extends StandardTokenParsers {
override val lexical = new IndentationLexical(
newlineBeforeIndent = true,
newlineAfterDedent = true,
startLineJoining = List("("),
endLineJoining = List(")"),
lineComment = "//",
blockCommentStart = "/*",
blockCommentEnd = "*/"
) {
reserved ++= List("if", "else", "then", "print")
delimiters ++= List("=", "+", "-", "*", "/", "<", ">", "(", ")")
}
import lexical.{Newline, Indent, Dedent}
def program: Parser[List[Stmt]] = rep(statement)
def statement: Parser[Stmt] =
assignment | ifStatement | printStatement
def assignment: Parser[Assign] =
ident ~ "=" ~ expr <~ Newline ^^ { case id ~ _ ~ value => Assign(id, value) }
def ifStatement: Parser[If] =
"if" ~ expr ~ "then" ~ Newline ~ Indent ~ rep1(statement) ~ Dedent ~ opt(elseClause) ^^ {
case _ ~ condition ~ _ ~ _ ~ _ ~ thenStmts ~ _ ~ elseStmts =>
If(condition, thenStmts, elseStmts)
}
// ... rest of parser
}
// Usage
val code = """
x = 10
if x > 5 then
print x
y = x * 2
print y
else
print 0
"""
ToyParser.parse(code) match {
case ToyParser.Success(ast, _) => println(s"Parsed: $ast")
case failure => println(s"Parse failed: $failure")
}
This example demonstrates:
- Variable assignments
- If/else statements with proper indentation
- Expression parsing with operator precedence
- Block structure using
INDENT
/DEDENT
tokens
The lexer generates these special tokens:
Newline
- End of logical lineIndent
- Increase in indentation levelDedent
- Decrease in indentation level
Plus all standard tokens from StdLexical
: identifiers, numeric literals, string literals, keywords, and delimiters.
git clone https://github.com/edadma/indentation.git
cd indentation
sbt test # Run tests
sbt compile # Compile for JVM
sbt js/compile # Compile for JavaScript
sbt native/compile # Compile for Native
The library includes comprehensive tests covering:
- Basic indentation/dedentation
- Line joining with parentheses and brackets
- Comment handling
- Mixed whitespace scenarios
- Edge cases and error conditions
Run tests with: sbt test
Perfect for building parsers for:
- Configuration languages - YAML-style config files
- Domain-specific languages - Clean syntax for business rules
- Template languages - Indented template systems
- Scripting languages - Python-style scripting DSLs
- Build files - Indented build configurations
- Scala 3.7.1+
scala-parser-combinators
(automatically included)
ISC License - see LICENSE file for details.
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Making indentation-sensitive parsing simple and reliable. 📝