Array-based immutable collections for scala.
Blog post explaining the motivation and performance characteristics.
Everything that is there is thoroughly tested using typelevel/discipline. Nevertheless, there are probably a few bugs, and everything is still subject to a lot of change.
These collections use algebra typeclasses such as Eq and Order, and cats typeclasses such as Show instead of relying on the equals method of the element objects, which sometimes does not work (e.g.
Array[Byte]) or does not make sense (
Compact in-memory representation
On modern CPUs, cache concerns are very important. So a compact in-memory representation is often more important for good overall performance than optimal big-O behavior. So in this library, compact in-memory representation is always given priority over reference-heavy trees with theoretically optimal Big-O performance.
This yields very good results regarding compactness and performance for most operations. The downside is that you have to provide ClassTag instances for almost every operation.
Specialization of operations
Most operations are specialized for common types (currently Int, Long, Double), so the specialized instances of typeclasses such as Order can be used to avoid boxing. Note that even for types for which the collections are not specialized, the internal representation for primitives will still be efficient due to the use of primitive arrays.
The scala collections in the standard library mostly implement collection/collection operations in terms of collection/element operations. The approach taken in this library is the opposite: focus on collection/collection operations and to implement collection/element operations in terms of collection/collection operations whenever possible. E.g. adding an element e to a set a will be done by merging a with a single-element set created from e.
Using flat arrays internally is very inefficient when e.g. adding elements one by one to a large collection. But when working with collections in a functional way, this is a pretty rare operation. Usually you apply transformations to the collection as a whole. For that use case, the array-based internal representation is very efficient.
Compatibility with scala collections
An optional interface to scala collections is provided, but the collections itself are not integrated into the scala collections hierarchy. They only implement equals, hashCode and toString on a best-effort basis. But you should use the Eq, Hash and Show typeclasses if possible, since they also work for set elements without working equality (e.g. arrays).
The partial collections in the left column are better for working with individual elements, whereas the total collections in the right column allow more typeclass instances to be defined. It is always possible to convert from partial to total and back in O(1).
| Partial | Total |
Partial methods like
apply(index: Int) for ArraySeq or
apply(key: K): V for ArrayMap are not provided. You have to convert the collection to the total version for that.
The extras module contains a few composite collections.
Basically just an array wrapped to ensure immutability. Specialized for fast primitive access.
A wrapped array with a default value, so that the
apply(index: Int) method can be defined. Having a total apply function allows to define many more typeclasses
A set backed by a sorted array. The internal representation is extremely compact, especially when using primitives. All boolean operations (union, intersect, diff, xor, subsetOf, intersects) are implemented efficiently.
A set backed by a sorted array, with an additional flag to allow negation, so it is possible to express e.g. "all Longs except 1".
The additional flag allows implementing the full Bool typeclass, because it is possible to define the e.g. the Set of all Longs, which is the
one method needed to implement Bool. The internal representation is extremely compact, especially when using primitives.
A map backed by a sorted array of keys and a corresponding array of values. The internal representation is extremely compact, especially when using primitives.
A map with default value, so that the apply method is total (hence the name). This map does not have as many operations as ArrayMap[K, V], but you can convert a TotalArrayMap[K, V] back to an ArrayMap[K, V] in O(1).
Read performance as well as creation performance will be significantly higher than scala collections when used properly. However, when adding single elements to large (100000 elements) collections, the performance will be pretty bad. So if that is your use-case, use something else. All collections contain Array in their name to keep you aware of the somewhat unusual performance characteristics.
Unfortunately, algebra currently does not contain a typeclass for hashing. The hashing typeclass in the scala library does not follow current conventions for typeclasses. Therefore,
Hash[T] is currently defined internally.