hammerlab / iterators

Enrichment-methods for Scala collections (Iterators, Iterables, Arrays)

GitHub

iterators

Build Status Coverage Status Maven Central

Enrichment-methods for Scala collections (Iterators, Iterables, and Arrays):

import hammerlab.iterator._
Iterator(1, 2, 3).nextOption           // Some(1)
Iterator(1, 2, 3).buffered.headOption  // Some(1)

Array(1, 2, 1, 3).countElems
// Map(1→2, 2→1, 3→1)

List(1, 1, 2, 1, 7, 7, 7).runLengthEncode
// Iterator(1→2, 2→1, 1→1, 7→3)

Methods are defined in org.hammerlab.iterator and made available for convenient importing in hammerlab.iterator

examples

by package:

count

Array(1, 2, 1, 3).countElems
// Map(1→2, 2→1, 3→1)

Iterator('a1, 'b2, 'a10, 'c3).countByKey
// Map('a→2, 'b→1, 'c→1)

either

def L[T](t: T) = Left(t)
def R[T](t: T) = Right(t)

Iterator(R('a), R('b), L(4)).findLeft
// Some(4)

Iterator(
  R('a),
  L( 1),
  R('b),
  R('c),
  L( 2),
  L( 3),
  R('d)
)
.groupByLeft
.mapValues(_.mkString(""))
.toList
// List((1,bc), (2,""), (3,d))

end

.finish: run a closure when the iterator is finished traversing:

import scala.io.Source.fromFile
val source = fromFile("build.sbt")
source
  .filter(_ == 'a')
  .finish({
    println("closing!")
    source.close()
  })
  .size

// 32
// prints "closing!" and closes `source` after traversal is finished

.dropright: drop k elements from the end of an iterator in O(k) space:

Iterator(1 to 10: _*).dropright(4)
// Iterator(1, 2, 3, 4, 5, 6)

group

Group runs of elements that satisfy a predicate or equivalence relation:

Iterator(1, 0, 2, 3, 0, 0, 4, 5, 6).groupRuns(_ > 0)
// Iterator(Iterator(1), Iterator(0), Iterator(2, 3), Iterator(0), Iterator(0), Iterator(4, 5, 6))

Run-length encode elements:

List(1, 1, 2, 1, 7, 7, 7).runLengthEncode
// Iterator(1→2, 2→1, 1→1, 7→3)

Contiguous weighted sums up to a maximum:

Iterator(1 to 6: _*).cappedCostGroups(costFn = xx, limit = 10)
// Iterator(Iterator(1, 2, 3, 4), Iterator(5), Iterator(6))

level

Flatten a nested iterator but retain access to a cursor into unflattened version:

val it1 = Iterator(1, 2)
val it2 = Iterator(3, 4)
val it = Iterator(it1, it2).level

it.cur.get == it1
// true

it.next
// 1

it.cur.get == it1
// true

it.next
// 2

it.cur.get == it2
// true

it.next
// 3

it.cur.get == it2
// true

it.next
// 4

it.cur
// None

ordered

A variety of merge operations are available for sequences that are mutually ordered (possibly with respect to some 3rd type that each of their elements can be converted to).

.eitherMerge

Merge two ordered sequences using Eithers to preserve provenance (or handle the case that the sequences' elements are not the same type):

Seq(1, 3, 4).eitherMerge(Seq(2, 3, 5, 6))
// Iterator(L(1), R(2), L(3), R(3), L(4), R(5), R(6))

.orMerge

Merge two ordered sequences using Ors:

Seq(1, 3, 4).orMerge(Seq(2, 3, 5, 6))
// Iterator(L(1), R(2), Both(3, 3), L(4), R(5), R(6))

.leftMerge

Collecting right-side elements for each left-side element:

Seq(1, 3, 4).leftMerge(Seq(2, 3, 5, 6))
// Iterator((1,Iterator(2)), (3,Iterator(3)), (4,Iterator(5, 6)))

.merge

Seq(1, 3, 4).merge(Seq(2, 3, 5, 6))
// Iterator(1, 2, 3, 3, 4, 5, 6)

Merging with a 3rd type

Instances of the View type-class let merges use a type other than that of the elements being merged:

// Rank a (Symbol,Int) pair using its Int value
implicit val view = View[(Symbol, Int), Int](_._2)

Seq('a1, 'b3).merge('c2)
// Iterator('a→1, 'c→2, 'b→3)

Seq('a  1, 'b  3).eitherMerge(2)
// Iterator(L('a→1), R(2), L('b→3))

range

sliceOpt, given a start and length:

(0 to 9).sliceOpt(0,  5)
// 0 to 4

(0 to 9).sliceOpt(0, 11)
// 0 to 9

(0 to 9).sliceOpt(2, 10)
// 2 to 9

(0 to 9).sliceOpt(2,  1)
// 2 to 2

Also, .joinOverlaps left-merges sequences of Ranges, sorted by start-coordinate, based on overlaps.

sample: reservoir-sample

Reservoir-sample:

Iterator(1 to 100: _*).sample(5)
Array(15, 18, 55, 63, 98)

scan

  • scanL, scanR
  • in terms of cats.Monoid
  • optionally include the final (total/sum) element instead of the initial empty/zero
import hammerlab.monoid._  // some Monoid defaults

Seq(1, 2, 3, 4).scanL
// Iterator(0, 1, 3, 6)

Seq(1, 2, 3, 4).scanLeftInclusive
// Iterator(1, 3, 6, 10)

Seq(1, 2, 3, 4).scanR
// Iterator(9, 7, 4, 0)

Seq(1, 2, 3, 4).scanRightInclusive
// Iterator(10, 9, 7, 4)

Additionally, scan over values of kv-pairs:

Seq('a'1, 'b'2, 'c'3, 'd'4).scanLeftValues
// Iterator((a,0), (b,1), (c,3), (d,6))

Seq('a'1, 'b'2, 'c'3, 'd'4).scanLeftValuesInclusive
// Iterator((a,1), (b,3), (c,6), (d,10))

Seq('a'1, 'b'2, 'c'3, 'd'4).scanRightValues
// Iterator((a,9), (b,7), (c,4), (d,0))

Seq('a'1, 'b'2, 'c'3, 'd'4).scanRightValuesInclusive
// Iterator((a,10), (b,9), (c,7), (d,4))

sliding

Windows of size 2, including an optional next or previous element:

Seq(1, 2, 3).sliding2
// Iterator((1,2), (2,3))

Seq(1, 2, 3).sliding2Opt
// Iterator((1,Some(2)), (2,Some(3)), (3,None))

Seq(1, 2, 3).sliding2Prev
// Iterator((None,1), (Some(1),2), (Some(2),3))

Windows of size 3, including 2 succeeding elements, one successor and one predecessor, or full tuples only:

Seq(1, 2, 3, 4).sliding3
// Iterator((1,2,3), (2,3,4))

Seq(1, 2, 3, 4).sliding3Opt
// Iterator((None,1,Some(2)), (Some(1),2,Some(3)), (Some(2),3,Some(4)), (Some(3),4,None))

Seq(1, 2, 3, 4).sliding3NextOpts
// Iterator((1,Some(2),Some(3)), (2,Some(3),Some(4)), (3,Some(4),None), (4,None,None))

Windows of arbitrary size, output having same number of elems as input:

Seq(1, 2, 3, 4, 5).slide(4)
// Iterator(Seq(1, 2, 3, 4), Seq(2, 3, 4, 5), Seq(3, 4, 5), Seq(4, 5), Seq(5))

start

  • takes, drops, collects with deterministic semantics around the iterator that was operated on
  • headOption, nextOption