shawjef3 / depthfirst

A faster implementation of flatMap, map, and withFilter for large collections.

GitHub

Stackless flatMap

by Jeff Shaw

Motivation

While thinking about data locality, performance, and flatMap, I created an implementation of flatMap that is independent of the collection type, and faster than the usual method for large numbers of flatMaps. Basically it eliminates intermediate collections. Use it when you have a large number of chained flatMaps.

Implementation

I created a run-time for Scala's for syntax operations that performs them in a depth-first manner. It uses a stack of iterators. A stack element is an operation to perform, along with the values for the operation. When a value is requested, operations are performed until a value without no further operations is found.

Clever readers will realize that this is similar to the way Stream works. If you have a Stream that is the result of various operations, and then ask for an element, the data structure will perform the minimum work to get it.

I have not analyzed the memory use of Stackless.

Use

Dependency

The current draft is on Sonatype's Maven repository for Scala 2.10, 2.11 and 2.12.

"me.jeffshaw.stackless" %% "stackless" % "0.1"

Example

Stackless is a data structure that you build up using the familiar for syntax. filter and flatMap do not execute your functions, but instead build an additional data structure, which is a TraversableOnce. You can force execution using map, foreach, toIterator, or any methods that use those methods, such as ==.

The following will print the results immediately.

import me.jeffshaw.stackless.Stackless

val vv = Vector(1,2,3)

def bind(x: Int): Vector[Int] = Vector(x)

for {
  v0 <- Stackless(vv)
  v1 <- bind(v0)
  v2 <- bind(v1)
} println(v2)

The following will create the resulting collection before printing the results.

import me.jeffshaw.stackless.Stackless

val vv = Vector(1,2,3)

def bind(x: Int): Vector[Int] = Vector(x)

val vv_ = {
  for {
    v0 <- Stackless(vv)
    v1 <- bind(v0)
    v2 <- bind(v1)
  } yield v2
}.toVector

println(vv_)

StacklessList

StacklessList demonstrates a data structure that is inherently stackless. You can use StacklessList as you would List, although there are odd performance characteristics. For instance, if you append to the end of a StacklessList, it forces execution.

CPU Benchmarks

I have run benchmarks using various sizes of Vector[Int]s, various numbers of flatMaps of x => Vector(x), and on a few different CPUs. Generally speaking, if you are using collections on the order of ______, or have _ or more flatMaps, you'll gain 10% by using Stackless. The benefit increases as the collection size increases, the number of flatMaps increases, or the amount of cache your CPU has increases.

Following are graphs of the % improvement you can expect from overhead coming from your collection operations. I apologize for the inconsistent colors. I'll try to improve them in the future.

DepthFirstBenchmarks.scala

If the images aren't loading, try the pdf.

vs Vector

DepthFirst performs better when using Vectors if your Vectors have more than about 1000 elements.

6 MB cache, Intel i7 - 4980 HQ, dedicated MacBook

image

8 MB cache, AMD Ryzen 1700, dedicated Windows

image

30 MB cache, Intel E5-2650L, VM from Digital Ocean

image

40 MB cache, Intel E5-2697A, VM from Digital Ocean

image

vs Stream

Stream performs much better than using Vector directly, and so we have to be using collections on the order of 10^6 to see a benefit from DepthFirst.

6 MB cache, Intel i7 - 4980 HQ, dedicated MacBook

image

8 MB cache, AMD Ryzen 1700, dedicated Windows

image

30 MB cache, Intel E5-2650L, VM from Digital Ocean

image

40 MB cache, Intel E5-2697A, VM from Digital Ocean

image

Best Case

The ideal use of DepthFirst is on a CPU with a large cache and large collections. The number of operations doesn't much matter. A CPU with 40 MB of cache and a collection with 3 million elements can see a 69% improvement in the overhead used by a single flatMap. Additional flatMaps yield an increase in the improvement.

image

Data

See benchmark_data.csv.

License

The contents of this repository are usable under the terms of GPL 3.