Schema-to-case-class code generation for working with Avro in Scala.
avrohugger-core: Generate source code at runtime for evaluation at a later step.avrohugger-filesorter: Sort schema files for proper compilation order.avrohugger-tools: Generate source code at the command line with the avrohugger-tools jar.
Alternative Distributions:
- sbt:
sbt-avrohugger- Generate source code at compile time with an sbt plugin. - Maven:
avrohugger-maven-plugin- Generate source code at compile time with a maven plugin. - Mill:
mill-avro- Generate source code at compile time with a Mill plugin. - Gradle:
gradle-avrohugger-plugin- Generate source code at compile time with a gradle plugin. - mu-rpc:
mu-scala- Generate rpc models, messages, clients, and servers.
- Supported Formats:
Standard,SpecificRecord - Supported Datatypes
- Logical Types Support
- Protocol Support
- Doc Support
- Usage
- Warnings
- Best Practices
- Testing
- Credits
-
StandardVanilla case classes (for use with Apache Avro'sGenericRecordAPI, etc.) -
SpecificRecordCase classes that implementSpecificRecordBaseand therefore have mutablevarfields (for use with the Avro Specific API - Scalding, Spark, Avro, etc.).
| Avro | Standard |
SpecificRecord |
Notes |
|---|---|---|---|
| INT | Int | Int | See Logical Types: date |
| LONG | Long | Long | See Logical Types: timestamp-millis |
| FLOAT | Float | Float | |
| DOUBLE | Double | Double | |
| STRING | String | String | |
| BOOLEAN | Boolean | Boolean | |
| NULL | Null | Null | |
| MAP | Map | Map | |
| ENUM | scala.Enumeration Scala case object Java Enum EnumAsScalaString Scala 3 Enum |
Java Enum EnumAsScalaString |
See Customizable Type Mapping |
| BYTES | Array[Byte] BigDecimal |
Array[Byte] BigDecimal |
See Logical Types: decimal |
| FIXED | case class case class + schema |
case class extending SpecificFixed |
See Logical Types: decimal |
| ARRAY | Seq List Array Vector |
Seq List Array Vector |
See Customizable Type Mapping |
| UNION | Option Either Shapeless Coproduct Scala 3 Union Types |
Option Either Shapeless Coproduct Scala 3 Union Types |
See Customizable Type Mapping |
| RECORD | case class case class + schema |
case class extending SpecificRecordBase |
See Customizable Type Mapping |
| PROTOCOL | No Type Scala ADT |
RPC trait Scala ADT |
See Customizable Type Mapping |
| Date | java.time.LocalDate java.sql.Date Int |
java.time.LocalDate java.sql.Date Int |
See Customizable Type Mapping |
| TimeMillis | java.time.LocalTime Int |
java.time.LocalTime Int |
See Customizable Type Mapping |
| TimeMicros | java.time.LocalTime Long |
java.time.LocalTime Long |
See Customizable Type Mapping |
| TimestampMillis | java.time.Instant java.sql.Timestamp Long |
java.time.Instant java.sql.Timestamp Long |
See Customizable Type Mapping |
| TimestampMicros | java.time.Instant java.sql.Timestamp Long |
java.time.Instant java.sql.Timestamp Long |
See Customizable Type Mapping |
| LocalTimestampMillis | java.time.LocalDateTime Long |
java.time.LocalDateTime Long |
See Customizable Type Mapping |
| LocalTimestampMicros | java.time.LocalDateTime Long |
java.time.LocalDateTime Long |
See Customizable Type Mapping |
| UUID | java.util.UUID | java.util.UUID | See Customizable Type Mapping |
| Decimal | BigDecimal | BigDecimal | See Customizable Type Mapping |
NOTE: Currently logical types are only supported for Standard and SpecificRecord formats
date: Annotates Avrointschemas to generatejava.time.LocalDateorjava.sql.Date(See Customizable Type Mapping). Examples: avdl, avsc.decimal: Annotates Avrobytesandfixedschemas to generateBigDecimal. Examples: avdl, avsc.timestamp-millis: Annotates Avrolongschemas to genaratejava.time.Instantorjava.sql.Timestamporlong(See Customizable Type Mapping). Examples: avdl, avsc.uuid: Annotates Avrostringschemas and idls to generatejava.util.UUID(See Customizable Type Mapping). Example: avsc.time-millis: Annotates Avrointschemas to genaratejava.time.LocalTimeorjava.sql.Timeorint
-
the records defined in
.avdl,.avpr, and json protocol strings can be generated as ADTs if the protocols define more than one Scala definition (note: message definitions are ignored when this setting is used). See Customizable Type Mapping. -
For
SpecificRecord, if the protocol contains messages then an RPC trait is generated (instead of generating and ADT, or ignoring the message definitions).
-
.avdl: Comments that begin with/**are used as the documentation string for the type or field definition that follows the comment. -
.avsc,.avpr, and.avro: Docs in Avro schemas are used to define a case class' ScalaDoc -
.scala: ScalaDocs of case class definitions are used to define record and field docs
Note: Currently Treehugger appears to generate Javadoc style docs (thus compatible with ScalaDoc style).
- Library For Scala 2.12, 2.13, and 3
- Parses Schemas and IDLs with Avro version 1.11
- Generates Code Compatible with Scala 2.12, 2.13, 3
"com.julianpeeters" %% "avrohugger-core" % "2.15.0"
Instantiate a Generator with Standard or SpecificRecord source formats.
Then use
tToFile(input: T, outputDir: String): Unit
or
tToStrings(input: T): List[String]
where T can be File, Schema, or String.
import avrohugger.Generator
import avrohugger.format.SpecificRecord
import java.io.File
val schemaFile = new File("path/to/schema")
val generator = new Generator(SpecificRecord)
generator.fileToFile(schemaFile, "optional/path/to/output") // default output path = "target/generated-sources"
where an input File can be .avro, .avsc, .avpr, or .avdl,
and where an input String can be the string representation of an Avro schema,
protocol, IDL, or a set of case classes that you'd like to have implement
SpecificRecordBase.
To reassign Scala types to Avro types, use the following (e.g. for customizing Specific):
import avrohugger.format.SpecificRecord
import avrohugger.types.ScalaVector
val myScalaTypes = Some(SpecificRecord.defaultTypes.copy(array = ScalaVector))
val generator = new Generator(SpecificRecord, avroScalaCustomTypes = myScalaTypes)
recordcan be assigned toScalaCaseClassandScalaCaseClassWithSchema(with schema in a companion object)arraycan be assigned toScalaSeq,ScalaArray,ScalaList, andScalaVectorenumcan be assigned toJavaEnum,ScalaCaseObjectEnum,EnumAsScalaString,ScalaEnumeration, andScala3Enumfixedcan be assigned toScalaCaseClassWrapperandScalaCaseClassWrapperWithSchema(with schema in a companion object)unioncan be assigned toOptionShapelessCoproduct,OptionEitherShapelessCoproduct,OptionalShapelessCoproductorOptionScala3UnionTypeint,long,float,doublecan be assigned toScalaInt,ScalaLong,ScalaFloat,ScalaDoubleprotocolcan be assigned toScalaADTandNoTypeGenerateddecimalcan be assigned to e.g.ScalaBigDecimal(Some(BigDecimal.RoundingMode.HALF_EVEN))andScalaBigDecimalWithPrecision(None)(via Shapeless Tagged Types)
Specifically for unions:
| Field Type ⬇️ / Behaviour ➡️ | OptionShapelessCoproduct | OptionEitherShapelessCoproduct | OptionalShapelessCoproduct | OptionScala3UnionType |
|---|---|---|---|---|
[{"type": "map", "values": "string"}] |
Map[String, String] |
Map[String, String] |
Map[String, String] :+: CNil |
Map[String, String] |
["null", "double"] |
Option[Double] |
Option[Double] |
Option[Double :+: CNil] |
Option[Double] |
["int", "string"] |
Int :+: String :+: CNil |
Either[Int, String] |
Int :+: String :+: CNil |
Int | String |
["null", "int", "string"] |
Option[Int :+: String :+: CNil] |
Option[Either[Int, String]] |
Option[Int :+: String :+: CNil] |
Option[Int | String] |
["boolean", "int", "string"] |
Boolean :+: Int :+: String :+: CNil |
Boolean :+: Int :+: String :+: CNil |
Boolean :+: Int :+: String :+: CNil |
Boolean | Int | String |
["null", "boolean", "int", "string"] |
Option[Boolean :+: Int :+: String :+: CNil] |
Option[Boolean :+: Int :+: String :+: CNil] |
Option[Boolean :+: Int :+: String :+: CNil] |
Option[Boolean | Int | String] |
Namespaces can be reassigned by instantiating a Generator with a custom
namespace map:
val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("oldnamespace"->"newnamespace"))
Note: Namespace mappings work for with KafkaAvroSerializer but not for KafkaAvroDeserializer; if anyone knows how to configure the deserializer to map incoming schema names to target class names please speak up!
Wildcarding the beginning of a namespace is permitted, place a single asterisk after the prefix that you want to map and any matching schema will have its namespace rewritten. Multiple conflicting wildcards are not permitted.
val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("example.*"->"example.newnamespace"))
"com.julianpeeters" %% "avrohugger-filesorter" % "2.15.0"
To ensure dependent schemas are compiled in the proper order (thus avoiding org.apache.avro.SchemaParseException: Undefined name: "com.example.MyRecord" parser errors), sort avsc and avdl files with the sortSchemaFiles method on AvscFileSorter and AvdlFileSorterrespectively.
import avrohugger.filesorter.AvscFileSorter
import java.io.File
val sorted: List[File] = AvscFileSorter.sortSchemaFiles((srcDir ** "*.avsc"))
Download the avrohugger-tools jar for Scala 2.12, Scala 2.13 (>30MB!), or Scala 3 and use it like the avro-tools jar Usage: [-string] (schema|protocol|datafile) input... outputdir:
generategenerates Scala case class definitions:
java -jar /path/to/avrohugger-tools_3-2.15.0-assembly.jar generate schema user.avsc .
generate-specificgenerates definitions that extend Avro'sSpecificRecordBase:
java -jar /path/to/avrohugger-tools_3-2.15.0-assembly.jar generate-specific schema user.avsc .
-
If your framework is one that relies on reflection to get the Schema, it will fail since Scala fields are private. Therefore preempt it by passing in a Schema to DatumReaders and DatumWriters (e.g.
val sdw = SpecificDatumWriter[MyRecord](schema)). -
For the
SpecificRecordformat, generated case class fields must be mutable (var) in order to be compatible with the SpecificRecord API. Note: If your framework allowsGenericRecord, avro4s provides a type class that converts to and from immutable case classes cleanly. -
SpecificRecordrequires thatenumbe represented asJavaEnum
To test for regressions, please run sbt:avrohugger> + test.
To test that generated code can be de/serialized as expected, please run:
sbt:avrohugger> + publishLocal- then clone sbt-avrohugger and update its avrohugger dependency to the locally published version
- finally run
sbt:sbt-avrohugger> scripted avrohugger/*, or, e.g.,scripted avrohugger/GenericSerializationTests
Depends on Avro and Treehugger. avrohugger-tools is based on avro-tools.
Contributors: