Schema-to-case-class code generation for working with Avro in Scala.
avrohugger-core
: Generate source code at runtime for evaluation at a later step.avrohugger-filesorter
: Sort schema files for proper compilation order.avrohugger-tools
: Generate source code at the command line with the avrohugger-tools jar.
Alternative Distributions:
- sbt:
sbt-avrohugger
- Generate source code at compile time with an sbt plugin. - Maven:
avrohugger-maven-plugin
- Generate source code at compile time with a maven plugin. - Mill:
mill-avro
- Generate source code at compile time with a Mill plugin. - Gradle:
gradle-avrohugger-plugin
- Generate source code at compile time with a gradle plugin. - mu-rpc:
mu-scala
- Generate rpc models, messages, clients, and servers.
- Supported Formats:
Standard
,SpecificRecord
- Supported Datatypes
- Logical Types Support
- Protocol Support
- Doc Support
- Usage
- Warnings
- Best Practices
- Testing
- Credits
-
Standard
Vanilla case classes (for use with Apache Avro'sGenericRecord
API, etc.) -
SpecificRecord
Case classes that implementSpecificRecordBase
and therefore have mutablevar
fields (for use with the Avro Specific API - Scalding, Spark, Avro, etc.).
Avro | Standard |
SpecificRecord |
Notes |
---|---|---|---|
INT | Int | Int | See Logical Types: date |
LONG | Long | Long | See Logical Types: timestamp-millis |
FLOAT | Float | Float | |
DOUBLE | Double | Double | |
STRING | String | String | |
BOOLEAN | Boolean | Boolean | |
NULL | Null | Null | |
MAP | Map | Map | |
ENUM | scala.Enumeration Scala case object Java Enum EnumAsScalaString |
Java Enum EnumAsScalaString |
See Customizable Type Mapping |
BYTES | Array[Byte] BigDecimal |
Array[Byte] BigDecimal |
See Logical Types: decimal |
FIXED | case class case class + schema |
case class extending SpecificFixed |
See Logical Types: decimal |
ARRAY | Seq List Array Vector |
Seq List Array Vector |
See Customizable Type Mapping |
UNION | Option Either Shapeless Coproduct |
Option Either Shapeless Coproduct |
See Customizable Type Mapping |
RECORD | case class case class + schema |
case class extending SpecificRecordBase |
See Customizable Type Mapping |
PROTOCOL | No Type Scala ADT |
RPC trait Scala ADT |
See Customizable Type Mapping |
Date | java.time.LocalDate java.sql.Date Int |
java.time.LocalDate java.sql.Date Int |
See Customizable Type Mapping |
TimeMillis | java.time.LocalTime Int |
java.time.LocalTime Int |
See Customizable Type Mapping |
TimeMicros | java.time.LocalTime Long |
java.time.LocalTime Long |
See Customizable Type Mapping |
TimestampMillis | java.time.Instant java.sql.Timestamp Long |
java.time.Instant java.sql.Timestamp Long |
See Customizable Type Mapping |
TimestampMicros | java.time.Instant java.sql.Timestamp Long |
java.time.Instant java.sql.Timestamp Long |
See Customizable Type Mapping |
LocalTimestampMillis | java.time.LocalDateTime Long |
java.time.LocalDateTime Long |
See Customizable Type Mapping |
LocalTimestampMicros | java.time.LocalDateTime Long |
java.time.LocalDateTime Long |
See Customizable Type Mapping |
UUID | java.util.UUID | java.util.UUID | See Customizable Type Mapping |
Decimal | BigDecimal | BigDecimal | See Customizable Type Mapping |
NOTE: Currently logical types are only supported for Standard
and SpecificRecord
formats
date
: Annotates Avroint
schemas to generatejava.time.LocalDate
orjava.sql.Date
(See Customizable Type Mapping). Examples: avdl, avsc.decimal
: Annotates Avrobytes
andfixed
schemas to generateBigDecimal
. Examples: avdl, avsc.timestamp-millis
: Annotates Avrolong
schemas to genaratejava.time.Instant
orjava.sql.Timestamp
orlong
(See Customizable Type Mapping). Examples: avdl, avsc.uuid
: Annotates Avrostring
schemas and idls to generatejava.util.UUID
(See Customizable Type Mapping). Example: avsc.time-millis
: Annotates Avroint
schemas to genaratejava.time.LocalTime
orjava.sql.Time
orint
-
the records defined in
.avdl
,.avpr
, and json protocol strings can be generated as ADTs if the protocols define more than one Scala definition (note: message definitions are ignored when this setting is used). See Customizable Type Mapping. -
For
SpecificRecord
, if the protocol contains messages then an RPC trait is generated (instead of generating and ADT, or ignoring the message definitions).
-
.avdl
: Comments that begin with/**
are used as the documentation string for the type or field definition that follows the comment. -
.avsc
,.avpr
, and.avro
: Docs in Avro schemas are used to define a case class' ScalaDoc -
.scala
: ScalaDocs of case class definitions are used to define record and field docs
Note: Currently Treehugger appears to generate Javadoc style docs (thus compatible with ScalaDoc style).
- Library For Scala 2.12, 2.13, and 3
- Parses Schemas and IDLs with Avro version 1.11
- Generates Code Compatible with Scala 2.12, 2.13, 3
"com.julianpeeters" %% "avrohugger-core" % "2.8.4"
Instantiate a Generator
with Standard
or SpecificRecord
source formats.
Then use
tToFile(input: T, outputDir: String): Unit
or
tToStrings(input: T): List[String]
where T
can be File
, Schema
, or String
.
import avrohugger.Generator
import avrohugger.format.SpecificRecord
import java.io.File
val schemaFile = new File("path/to/schema")
val generator = new Generator(SpecificRecord)
generator.fileToFile(schemaFile, "optional/path/to/output") // default output path = "target/generated-sources"
where an input File
can be .avro
, .avsc
, .avpr
, or .avdl
,
and where an input String
can be the string representation of an Avro schema,
protocol, IDL, or a set of case classes that you'd like to have implement
SpecificRecordBase
.
To reassign Scala types to Avro types, use the following (e.g. for customizing Specific
):
import avrohugger.format.SpecificRecord
import avrohugger.types.ScalaVector
val myScalaTypes = Some(SpecificRecord.defaultTypes.copy(array = ScalaVector))
val generator = new Generator(SpecificRecord, avroScalaCustomTypes = myScalaTypes)
record
can be assigned toScalaCaseClass
andScalaCaseClassWithSchema
(with schema in a companion object)array
can be assigned toScalaSeq
,ScalaArray
,ScalaList
, andScalaVector
enum
can be assigned toJavaEnum
,ScalaCaseObjectEnum
,EnumAsScalaString
, andScalaEnumeration
fixed
can be assigned toScalaCaseClassWrapper
andScalaCaseClassWrapperWithSchema
(with schema in a companion object)union
can be assigned toOptionShapelessCoproduct
,OptionEitherShapelessCoproduct
, orOptionalShapelessCoproduct
int
,long
,float
,double
can be assigned toScalaInt
,ScalaLong
,ScalaFloat
,ScalaDouble
protocol
can be assigned toScalaADT
andNoTypeGenerated
decimal
can be assigned to e.g.ScalaBigDecimal(Some(BigDecimal.RoundingMode.HALF_EVEN))
andScalaBigDecimalWithPrecision(None)
(via Shapeless Tagged Types)
Specifically for unions:
Field Type ⬇️ / Behaviour ➡️ | OptionShapelessCoproduct | OptionEitherShapelessCoproduct | OptionalShapelessCoproduct |
---|---|---|---|
[{"type": "map", "values": "string"}] |
Map[String, String] |
Map[String, String] |
Map[String, String] :+: CNil |
["null", "double"] |
Option[Double] |
Option[Double] |
Option[Double :+: CNil] |
["int", "string"] |
Int :+: String :+: CNil |
Either[Int, String] |
Int :+: String :+: CNil |
["null", "int", "string"] |
Option[Int :+: String :+: CNil] |
Option[Either[Int, String]] |
Option[Int :+: String :+: CNil] |
["boolean", "int", "string"] |
Boolean :+: Int :+: String :+: CNil |
Boolean :+: Int :+: String :+: CNil |
Boolean :+: Int :+: String :+: CNil |
["null", "boolean", "int", "string"] |
Option[Boolean :+: Int :+: String :+: CNil] |
Option[Boolean :+: Int :+: String :+: CNil] |
Option[Boolean :+: Int :+: String :+: CNil] |
Namespaces can be reassigned by instantiating a Generator
with a custom
namespace map:
val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("oldnamespace"->"newnamespace"))
Note: Namespace mappings work for with KafkaAvroSerializer but not for KafkaAvroDeserializer; if anyone knows how to configure the deserializer to map incoming schema names to target class names please speak up!
Wildcarding the beginning of a namespace is permitted, place a single asterisk after the prefix that you want to map and any matching schema will have its namespace rewritten. Multiple conflicting wildcards are not permitted.
val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("example.*"->"example.newnamespace"))
"com.julianpeeters" %% "avrohugger-filesorter" % "2.8.4"
To ensure dependent schemas are compiled in the proper order (thus avoiding org.apache.avro.SchemaParseException: Undefined name: "com.example.MyRecord"
parser errors), sort avsc and avdl files with the sortSchemaFiles
method on AvscFileSorter
and AvdlFileSorter
respectively.
import avrohugger.filesorter.AvscFileSorter
import java.io.File
val sorted: List[File] = AvscFileSorter.sortSchemaFiles((srcDir ** "*.avsc"))
Download the avrohugger-tools jar for Scala 2.12, or Scala 2.13 (>30MB!) and use it like the avro-tools jar Usage: [-string] (schema|protocol|datafile) input... outputdir
:
generate
generates Scala case class definitions:
java -jar /path/to/avrohugger-tools_2.12-2.8.4-assembly.jar generate schema user.avsc .
generate-specific
generates definitions that extend Avro'sSpecificRecordBase
:
java -jar /path/to/avrohugger-tools_2.12-2.8.4-assembly.jar generate-specific schema user.avsc .
-
If your framework is one that relies on reflection to get the Schema, it will fail since Scala fields are private. Therefore preempt it by passing in a Schema to DatumReaders and DatumWriters (e.g.
val sdw = SpecificDatumWriter[MyRecord](schema)
). -
For the
SpecificRecord
format, generated case class fields must be mutable (var
) in order to be compatible with the SpecificRecord API. Note: If your framework allowsGenericRecord
, avro4s provides a type class that converts to and from immutable case classes cleanly. -
SpecificRecord
requires thatenum
be represented asJavaEnum
To test for regressions, please run sbt:avrohugger> + test
.
To test that generated code can be de/serialized as expected, please run:
sbt:avrohugger> + publishLocal
- then clone sbt-avrohugger and update its avrohugger dependency to the locally published version
- finally run
sbt:sbt-avrohugger> scripted avrohugger/*
, or, e.g.,scripted avrohugger/GenericSerializationTests
Depends on Avro and Treehugger. avrohugger-tools
is based on avro-tools.
Contributors: