Jsoniter Scala

build status code coverage Gitter chat Scaladex

Scala macros that generates codecs for case classes, standard types and collections to get maximum performance of JSON parsing & serialization.

Latest results of benchmarks which compare parsing & serialization performance of Jsoniter Scala vs. Jackson, Circe and Play-JSON libraries using JDK 8 & JDK 9 on the following environment: Intel® Core™ i7-7700HQ CPU @ 2.8GHz (max 3.8GHz), RAM 16Gb DDR4-2400, Ubuntu 16.04, Linux notebook 4.13.0-32-generic, Oracle JDK 64-bit (builds 1.8.0_161-b12 and 9.0.4+11 accordingly)

Goals

Initially this library was developed for requirements of real-time bidding in ad-tech and goals was simple:

  • do parsing & serialization of JSON directly from UTF-8 bytes to your case classes & Scala collections and back but do it crazily fast w/o reflection, intermediate trees, strings or events, w/ minimum allocations & copying
  • do validation of UTF-8 encoding, JSON format & mapped values efficiently with clear reporting, do not replace illegally encoded characters of string values by placeholder characters

It targets JDK 8+ w/o any platform restrictions.

Support of Scala.js & Scala Native is not a goal for the moment.

Features and limitations

  • JSON parsing from Array[Byte] or java.io.InputStream
  • JSON serialization to Array[Byte] or java.io.OutputStream
  • Parsing of streaming JSON values and JSON arrays from java.io.InputStream w/o need of holding all parsed values in the memory
  • Support reading part of Array[Byte] by specifying of position and limit of reading from/to
  • Support writing to pre-allocated Array[Byte] by specifying of position of writing from
  • Support of UTF-8 encoding
  • Parsing of strings with escaped characters for JSON keys and string values
  • Codecs can be generated for primitives, boxed primitives, enums, String, BigInt, BigDecimal, Option, tuples, java.util.UUID, java.time.*, Scala collections, arrays, module classes, value classes and case classes with values/fields having any of types listed here
  • Case classes should be defined as a top-level class or directly inside of another class or object and with public constructor that has one list of arguments for all non-transient fields
  • Types that supported as map keys are primitives, boxed primitives, enums, String, BigInt, BigDecimal, java.util.UUID, java.time.*, and value classes for any of them
  • Support of ADTs with sealed trait or sealed abstract class base and case classes or case objects as leaf classes, using discriminator field with string type of value
  • Implicitly resolvable codecs for any types
  • Support only acyclic graphs of class instances
  • Fields with default values that defined in the constructor are optional, other fields are required (no special annotation required)
  • Fields with values that are equals to default values, or are empty options/collections/arrays are not serialized to provide sparse output
  • Fields can be annotated as transient or just not defined in constructor to avoid parsing and serializing at all
  • Field names can be overridden for serialization/parsing by field annotation in case classes
  • Parsing exception always reports a hexadecimal offset of Array[Byte] or InputStream where it occurs and optional hex dump of affected by error part of an internal byte buffer
  • Configurable by field annotation ability to read/write numeric fields from/to string values
  • No extra buffering is required when parsing from InputStream or serializing to OutputStream
  • No dependencies on extra libraries excluding Scala's scala-library and scala-reflect

There are number of configurable options that can be set in compile-time:

  • Ability to read/write number of containers from/to string values
  • Skipping of unexpected fields or throwing of parse exceptions
  • Mapping function for names between case classes and JSON, including predefined functions which enforce snake_case or camelCase names for all fields
  • Name of a discriminator field for ADTs
  • Mapping function for values of a discriminator field that is used for distinguish classes of ADTs

List of options that change parsing & serialization in runtime:

  • Serialization of strings with escaped Unicode characters to be ASCII compatible
  • Indenting of output and its step
  • Throwing of stackless parsing exceptions to greatly reduce impact on performance
  • Turning off hex dumping of affected by error part of an internal byte buffer to reduce impact on performance
  • Preferred size of internal buffers when parsing from InputStream or serializing to OutputStream

For upcoming features and fixes see Commits and Issues page.

How to use

Add the library to your dependencies list

libraryDependencies += "com.github.plokhotnyuk.jsoniter-scala" %% "macros" % "0.11.0"

Generate codecs for your case classes, collections, etc.

import com.github.plokhotnyuk.jsoniter_scala.macros._
import com.github.plokhotnyuk.jsoniter_scala.core._

case class Device(id: Int, model: String)

case class User(name: String, devices: Seq[Device])

implicit val codec: JsonCodec[User] = JsonCodecMaker.make[User](CodecMakerConfig())

That's it! You have generated an instance of com.github.plokhotnyuk.jsoniter_scala.core.JsonCodec.

Now you can use it for parsing & serialization:

val user = read("""{"name":"John","devices":[{"id":1,model:"HTC One X"}]}""".getBytes)
val json = write(User(name = "John", devices = Seq(Device(id = 2, model = "iPhone X"))))

To see generated code add the following line to your sbt build file

scalaOptions += "-Xmacro-settings:print-codecs"

For more use cases & examples, please, check out tests:

How to develop

Feel free to ask questions in chat, open issues, or contribute by creating pull requests (fixes and improvements of docs, code and tests are highly appreciated)

Run tests, check coverage and binary compatibility

sbt -J-XX:MaxMetaspaceSize=512m clean +coverage +test +coverageReport +mimaReportBinaryIssues

Run benchmarks

Sbt plugin for JMH tool is used for benchmarking, to see all their features & options please check Sbt-JMH docs and JMH tool docs.

Learn how to write benchmarks in JMH samples and JMH articles posted in Aleksey Shipilёv’s and Nitsan Wakart’s blogs.

List of available option can be printed by:

sbt 'benchmark/jmh:run -h'

JMH allows to run benchmarks with different profilers, to get list of supported use:

sbt 'benchmark/jmh:run -lprof'

Help for profiler options can be printed by following command:

sbt 'benchmark/jmh:run -prof <profiler_name>:help'

To get result for some benchmarks in flight recording file (which you can then open and analyse offline using JMC) use command like this:

sbt clean 'benchmark/jmh:run -prof jmh.extras.JFR -wi 10 -i 50 .*GoogleMapsAPI.*readJsoniter.*'

On Linux the perf profiler can be used to see CPU event statistics normalized per ops:

sbt -no-colors clean 'benchmark/jmh:run -prof perfnorm .*TwitterAPI.*' >twitter_api_perfnorm.txt

Following command can be used to profile & print assembly code of hottest methods, but it requires setup of an additional library to make PrintAssembly feature enabled:

sbt -no-colors clean 'benchmark/jmh:run -prof perfasm -wi 10 -i 10 .*Adt.*readJsoniter.*' >read_adt_perfasm.txt

To see throughput with allocation rate of generated codecs run benchmarks with GC profiler using following command:

sbt -no-colors clean 'benchmark/jmh:run -prof gc .*Benchmark.*' >gc.txt

Results of benchmark can be stored in different formats: *.csv, *.json, etc. All supported formats can be listed by:

sbt 'benchmark/jmh:run -lrf

Results that are stored in JSON can be easy plotted in JMH Visualizer by drugging & dropping of your file to the drop zone or using the source parameter with an HTTP link to your file in the URL like here.

More info about extras, including jmh.extras.Async and ability to generate flame graphs see in Sbt-JMH docs

Publish locally

Publish to local Ivy repo:

sbt publishLocal

Publish to local Maven repo:

sbt publishM2

Release

For version numbering use Recommended Versioning Scheme that is used in the Scala ecosystem.

Double check binary & source compatibility (including behaviour) and release using the following command (credentials required):

sbt release

Do not push changes to github until promoted artifacts for new version are not available for download on Maven Central Repository to avoid binary compatibility check failures in triggered Travis CI builds.

Acknowledgements

This library started from macros that reused Jsoniter Java reader & writer and generated codecs for them but than evolved to have own core of mechanics for parsing & serialization.

Idea to generate codecs by Scala macros & main details was borrowed from Kryo Macros and adapted for needs of JSON domain.

Other Scala macros features was peeped in AVSystem Commons Library for Scala