The Scala library provides Tuple1
to Tuple22
that allow programmers to hold a fixed number of items together so they can be passed as a single object. While all the elements in an Array
have the same type, a TupleN
can have a mix of element types, e.g.
scala> val mytuple = ((2, "Be"), "Or", "Not", (2, "Be"))
mytuple: ((Int, String), String, String, (Int, String)) = ((2,Be),Or,Not,(2,Be))
scala> mytuple._1
res1: (Int, String) = (2,Be)
In this example, mytuple
is a Tuple4
and has both Int
and String
elements.
The same code using Avro tuples, looks like...
scala> val mytuple = AvroTuple4(AvroTuple2(2, "Be"), "Or", "Not", AvroTuple2(2, "Be"))
mytuple: com.github.massie.avrotuples.AvroTuple4[com.github.massie.avrotuples.AvroTuple2[Int,String],String,String,com.github.massie.avrotuples.AvroTuple2[Int,String]] = ((2,Be),Or,Not,(2,Be))
scala> mytuple._1
res0: com.github.massie.avrotuples.AvroTuple2[Int,String] = (2,Be)
Avro tuples is published to Maven Central.
In Maven, use
<dependency>
<groupId>com.github.massie</groupId>
<artifactId>avrotuples_**SCALA_VERSION**</artifactId>
<version>**AVROTUPLES_VERSION**</version>
</dependency>
In sbt
, add the line
libraryDependencies += "com.github.massie" %% "avrotuples" % "**AVROTUPLES_VERSION**"
Note, that for sbt
you don't need to specify the Scala version since the line above uses %%
which will automatically use the correct Scala version.
- Avro tuples can serve as a drop in replacement for Scala tuples
AvroTuple2
has aswap
method just likeTuple2
- All Avro tuples extend
ProductN
, e.g.AvroTuple1[T1]
extendsProduct1[T1]
- Avro tuples implement
Externalizable
making them Java serializable - Avro tuples can be nested
This interface allows Avro to (de)serialize Avro tuples. An Avro serialize/deserialize round-trip looks like...
val tuple = AvroTuple2("This", AvroTuple4("That", "and", "the", "other"))
val outTuple = AvroTuple2.fromBytes(tuple.toBytes)
assert(tuple == outTuple)
If you pass Avro tuples to Kryo, the tuple will be (de)serialized in Avro format using the Avro tuple schema.
You can update the values for an Avro tuple without needing to create a new tuple, e.g.
val tuple = AvroTuple2("One", 1L)
assert(tuple._1 == "One")
assert(tuple._2 == 1L)
tuple.update("Two", 2L)
assert(tuple._1 == "Two")
assert(tuple._2 == 2L)
Scala provides syntactic sugar that Avro tuples do not. In Scala, you don't need to write Tuple2("a", "b")
, you can just use ("a", "b")
. Avro tuple code is more verbose.
For now, Avro tuples can be comprised of null values, strings, booleans, floats, doubles, ints, and longs. Support for more types is coming, e.g. Option
.
There is a known issue with Avro/Parquet and recursive schemas. AvroTuples use a recursive schema in order to support nesting. If you are using AvroTuples with Parquet, you will need to use the AvroFlatTupleX
types, since they have flat schemas.
Avro tuples is released under an Apache 2.0 license.
Pull requests are welcomed.