Useful extensions to Scala's Iterator. Think errata for iterators.
Using SBT:
libraryDependencies += "com.timgroup" %% "iterata" % "0.1.6"Or download the jar directly from maven central.
Iterata is currently published for Scala 2.11 only, please feel free to let us know if you'd like a build for a different Scala version.
Use the #par() method to add parallelism when processing an Iterator with functions chained via #map and #flatMap. It will eagerly evaluate the underlying iterator in chunks, and then evaluate the functions on each chunk via the Scala Parallel Collections. For example:
scala> import com.timgroup.iterata.ParIterator.Implicits._
scala> val it = (1 to 100000).iterator.par().map(n => (n + 1, Thread.currentThread.getId))
scala> it.map(_._2).toSet.size
res2: Int = 8 // addition was distributed over 8 threadsYou can provide a specific chunk size, for example it.par(100).
Note that only the following Iterator methods are implemented (so far) to make use of parallel collections:
#map#flatMap#filter#find
The #par() method is available on any iterator, and takes an optional chunk size parameter. However, if you already have a GroupedIterator, you can simply call #par since it is already grouped. For example:
scala> val it = (1 to 100000).iterator.grouped(4).parUse the #memoizeExhaustion method to wrap an Iterator so that its #hasNext method will
not be called again after returning false. This is useful in cases where it is expensive
to check if there is a next element, such as when I/O is involved.
Can serve as a workaround for SI-9623, where
after concatenating two iterators with ++, the left iterator's #hasNext will be called twice
for every call to the right iterator's #next().
scala> import com.timgroup.iterata.MemoizeExhaustionIterator.Implicits._
scala> val it1 = new IteratorWithExpensiveHasNext()
scala> val it2 = new IteratorWithExpensiveHasNext()
scala> (it1.memoizeExhaustion ++ it2).foreach(_ => ())
scala> it1.numTimesHasNextReturnedFalse
res2: Int = 1