jrudolph / speed   15

MIT License GitHub

Looping like a pro

Scala versions: 2.11 2.10

import speed._

🍰 from the hot shores of the black forest 🍰

What?

speed is a compile-time only macro library that provides functionality similar to Scala collection's views but with an efficient implementation that inlines the complete processing into a (mostly) single while-loop at the use site.

As an example see this:

array.speedy.map(x => x * x).foldLeft(0)(_ + _)

will be converted into code roughly equivalent to this:

var acc = 0
var i = 0
while (i < array.length) {
  val x = array(i)
  acc = acc + x * x
  i += 1
}
acc

Installation

resolvers += Opts.resolver.sonatypeReleases

// only needed at compile-time
libraryDependencies += "net.virtual-void" %% "speed" % "15" % "provided"

Usage

There are two principle ways of using speed. Either, you explicitly request optimization by "converting" a collection into a speedy one using .speedy. Or, you can also import automatic conversions in which case supported collections will be auto-optimized (automatic conversion currently only supported for Range and Array).

Similar to Java 7 Streams, a speed expression must start with a "Generator" like a Range or an Array, then apply some non-terminal processing steps, and finally end with a terminal result.

Intermediate results are usually never materialized into a collection and therefore cannot be put into a variable. Instead, use to[Coll] as a terminal step to create a result collection if needed.

Explicit

Add an import to speed in the source file:

import speed._

and then, convert a collection to its speedy variant by adding .speedy (similar as you would add .view):

array.speedy.map(_ + 2).sum // this will run fast...

Automatic

Ranges and arrays can also be automatically optimized by import speed.auto._ at the top of a source file:

import speed.auto._

array.map(_ + 2).sum // this will also run fast...

Semantics

Semantics are similar to what to expect from Scala collection's views. However, if in question we'll take the liberty to divert from the known behavior to allow more aggressive optimizations.

The basic optimization approach is based on aggressive inlining the whole processing pipeline into a single while-loop.

Apart from that, speed aggressively tries to make use of static information in several ways:

  • special cased range loops for the common cases (step = +/- 1)
  • static optimizations over the operations structure to avoid work where possible
  • a final run of constant-folding tries to gather static information

Note: function arguments to map, filter, etc. are assumed to be pure and to execute no side-effects to be able to reorder operations for optimizations. You cannot rely on side-effects in these functions as they may not be executed for speed optimized code if the result of the function call isn't needed for the final result.

Features

Supported collection types

  • Range
  • Array
  • IndexedSeq
  • List

Supported operations

speed currently supports a subset of interesting and important collection operations from the Scala collections.

Non-Terminal

See the NonTerminalOps type:

  • flatMap
  • map
  • filter
  • withFilter
  • reverse
  • take
  • (more to come)

Terminal

See the TerminalOps type:

  • foldLeft
  • foreach
  • reduce
  • sum
  • min
  • max
  • count
  • size
  • mkString
  • to[Coll[_]]
  • forall
  • (more to come)

Optimizations

speed already employs some optimizations on the operations structure like these:

  • try to push selection of elements down to the generator (like take or reverse)
  • implement array operations in terms of range
  • generate three different kind of basic loops for ranges depending on static information about start, end, step and isInclusive if available

Planned

  • for-comprehension pattern extraction support (avoid tuple generation)
  • replace well-known Numeric and Ordering instances by primitive operations

How?

In the words of the master of enrichment:

Yo dawg, we heard you like your scala fast, so we put code in yo' code so it loops while it loops

Debugging Tools

  • Wrap an expression with speed.impl.Debug.show to show the generated code

Known issues

It is known that speed still optimizes side-effects in user code too aggressively away in some cases. If you encounter such an issue please report it on the issue tracker.

Related work

Benchmarks

Runtime (smaller is better, reference is an equivalent while loop, "Scala" uses the same expression with views)

Description While Speedy Scala
foreach counting 100 % 102.89 % 430.06 %
nested counting 100 % 98.63 % 389.60 %
filtered summing 100 % 99.40 % 861.21 %
mapped summing 100 % 100.15 % 3985.92 %
array foreach counting 100 % 99.24 % 450.01 %
array summing 100 % 100.23 % 625.30 %
array filtered summing 100 % 103.45 % 1137.95 %
array mapped summing 100 % 101.55 % 2101.77 %
size of filtered ref array 100 % 98.30 % 929.55 %
array map - filter - to 100 % 95.01 % 291.28 %
list foreach counting 100 % 100.07 % 100.87 %
list summing 100 % 104.40 % 202.86 %
list filtered summing 100 % 104.42 % 627.96 %
list mapped summing 100 % 100.08 % 769.58 %
nested list summing 100 % 99.52 % 1133.47 %

(Don't be fooled, values < 100 % are most likely not significant but I'm including them here anyways just for the giggles. πŸ˜†)

(Don't be fooled2, I obviously selected benchmarks in the favor of speed, if you'd like to see other comparisons please PR.)

Benchmark results can be regenerated by running the PerformanceSpecs in the code base.

Foreach counting

var counter = 0
for (i ← 1 to 1000) counter += i * i
counter

Nested counting

var counter = 0
for {
  i ← 1 to 100
  j ← 1 to 100
} counter += i * j
counter

Filtered summing

(1 to 1000).filter(_ % 3 == 0).sum

Mapped summing

(1 to 1000).map(i β‡’ i * i).sum

Array foreach counting

var counter = 0
for (x ← array) counter += x * x
counter

Array summing

array.sum

Array filtered summing

array.filter(_ % 3 == 0).sum

Array mapped summing

array.map(x β‡’ x * x).sum

Size of Filtered ref Array

class Ref(var num: Int = 0)

val N = 1000
val refs = (0 until N).map(i β‡’ new Ref(i)).toArray

refs
  .filter(_.num % 5 == 0)
  .filter(_.num % 7 == 0)
  .size

Array Map - Filter - To

array.map(x β‡’ x * x).filter(_ % 3 == 0).to[List]

List foreach counting

var counter = 0
for (x ← list) counter += x * x
counter

List summing

list.sum

List filtered summing

list.filter(_ % 3 == 0).sum

List mapped summing

list.map(x β‡’ x * x).sum

Nested list summing

(for (x ← list1; y ← list2) yield x * y).sum

githalytics.com alpha