taig / scala-pygments

A Scala wrapper around Pygments based on GraalVM's Python Runtime

Version Matrix

Scala Pygments

CI & CD scala-pygments-core Scala version support

A Scala wrapper around Pygments based on GraalVM's Python Runtime

Prerequisites

GraalVM with support for Python must be used as the Java runtime

gu install python

Pygments must be installed in a GraalVM Python environment

pip install Pygments

To run the integration tests, a python environment must first be created

graalpython -m venv ${JAVA_HOME}/languages/python/scala-pygments/
${JAVA_HOME}/languages/python/scala-pygments/bin/pip install Pygments

Installation

sbt

libraryDependencies ++=
  "io.taig" %%% "scala-pygments-core" % "x.y.z" :: 
  "io.taig" %% "scala-pygments-graalvm-python" % "x.y.z" ::
  "io.taig" %% "scala-pygments-cli" % "x.y.z" ::
  Nil

Usage

Currently, this library only exposes a single method, allowing to tokenize source code with a specific lexer.

import cats.effect.{IO, IOApp}
import io.taig.pygments.GraalVmPythonPygments

object App extends IOApp.Simple {
  val python = s"${System.getenv("JAVA_HOME")}/languages/python/scala-pygments/bin/python"

  override def run: IO[Unit] =
    GraalVmPythonPygments
      .default[IO](python)
      .use(_.tokenize("Scala", """println("Hello world!")"""))
      .flatMap(IO.println)
}
sbt> run
List(Fragment(Name(None),'println'), Fragment(Punctuation,'('), Fragment(Literal(String(None)),'"Hello world!"'), Fragment(Punctuation,')'), Fragment(Text(None),'\n'))

Benchmarks

> benchmarks/Jmh/run -wi 10 -i 5 -f1 -t4
[info] Benchmark                                              Mode  Cnt     Score    Error  Units
[info] CliTokenizeBenchmark.tokenizeLong                     thrpt    5    12,419 ±  0,469  ops/s
[info] CliTokenizeBenchmark.tokenizeMedium                   thrpt    5    16,245 ±  0,643  ops/s
[info] CliTokenizeBenchmark.tokenizeShort                    thrpt    5    16,699 ±  0,391  ops/s
[info] GraalVmPythonDefaultTokenizeBenchmark.tokenizeLong    thrpt    5     0,958 ±  0,063  ops/s
[info] GraalVmPythonDefaultTokenizeBenchmark.tokenizeMedium  thrpt    5    24,724 ±  0,239  ops/s
[info] GraalVmPythonDefaultTokenizeBenchmark.tokenizeShort   thrpt    5   625,756 ± 24,527  ops/s
[info] GraalVmPythonPooledTokenizeBenchmark.tokenizeLong     thrpt    5     1,475 ±  0,375  ops/s
[info] GraalVmPythonPooledTokenizeBenchmark.tokenizeMedium   thrpt    5    87,482 ±  1,239  ops/s
[info] GraalVmPythonPooledTokenizeBenchmark.tokenizeShort    thrpt    5  2374,671 ± 32,192  ops/s