π° from the hot shores of the black forest π°
speed is a compile-time only macro library that provides functionality similar to Scala collection's views but with an efficient implementation that inlines the complete processing into a (mostly) single while-loop at the use site.
As an example see this:
array.speedy.map(x => x * x).foldLeft(0)(_ + _)
will be converted into code roughly equivalent to this:
var acc = 0
var i = 0
while (i < array.length) {
val x = array(i)
acc = acc + x * x
i += 1
}
acc
resolvers += Opts.resolver.sonatypeReleases
// only needed at compile-time
libraryDependencies += "net.virtual-void" %% "speed" % "15" % "provided"
There are two principle ways of using speed. Either, you explicitly request optimization by
"converting" a collection into a speedy one using .speedy
. Or, you can also import automatic
conversions in which case supported collections will be auto-optimized (automatic conversion
currently only supported for Range
and Array
).
Similar to Java 7 Streams, a speed expression must start with a "Generator" like a Range
or
an Array
, then apply some non-terminal processing steps, and finally end with a terminal result.
Intermediate results are usually never materialized into a collection and therefore cannot be put
into a variable. Instead, use to[Coll]
as a terminal step to create a result collection if needed.
Add an import to speed in the source file:
import speed._
and then, convert a collection to its speedy variant by adding .speedy
(similar as you would add .view
):
array.speedy.map(_ + 2).sum // this will run fast...
Ranges and arrays can also be automatically optimized by import speed.auto._
at the top of a
source file:
import speed.auto._
array.map(_ + 2).sum // this will also run fast...
Semantics are similar to what to expect from Scala collection's views. However, if in question we'll take the liberty to divert from the known behavior to allow more aggressive optimizations.
The basic optimization approach is based on aggressive inlining the whole processing pipeline into a single while-loop.
Apart from that, speed aggressively tries to make use of static information in several ways:
- special cased range loops for the common cases (step = +/- 1)
- static optimizations over the operations structure to avoid work where possible
- a final run of constant-folding tries to gather static information
Note: function arguments to map
, filter
, etc. are assumed to be pure and to execute no side-effects
to be able to reorder operations for optimizations. You cannot rely on side-effects in these functions
as they may not be executed for speed optimized code if the result of the function call isn't needed
for the final result.
Range
Array
IndexedSeq
List
speed currently supports a subset of interesting and important collection operations from the Scala collections.
See the NonTerminalOps type:
- flatMap
- map
- filter
- withFilter
- reverse
- take
- (more to come)
See the TerminalOps type:
- foldLeft
- foreach
- reduce
- sum
- min
- max
- count
- size
- mkString
- to[Coll[_]]
- forall
- (more to come)
speed already employs some optimizations on the operations structure like these:
- try to push selection of elements down to the generator (like
take
orreverse
) - implement array operations in terms of range
- generate three different kind of basic loops for ranges depending on static information
about
start
,end
,step
andisInclusive
if available
- for-comprehension pattern extraction support (avoid tuple generation)
- replace well-known
Numeric
andOrdering
instances by primitive operations
In the words of the master of enrichment:
- Wrap an expression with
speed.impl.Debug.show
to show the generated code
It is known that speed still optimizes side-effects in user code too aggressively away in some cases. If you encounter such an issue please report it on the issue tracker.
Runtime (smaller is better, reference is an equivalent while loop, "Scala" uses the same expression with views)
Description | While | Speedy | Scala |
---|---|---|---|
foreach counting | 100 % | 102.89 % | 430.06 % |
nested counting | 100 % | 98.63 % | 389.60 % |
filtered summing | 100 % | 99.40 % | 861.21 % |
mapped summing | 100 % | 100.15 % | 3985.92 % |
array foreach counting | 100 % | 99.24 % | 450.01 % |
array summing | 100 % | 100.23 % | 625.30 % |
array filtered summing | 100 % | 103.45 % | 1137.95 % |
array mapped summing | 100 % | 101.55 % | 2101.77 % |
size of filtered ref array | 100 % | 98.30 % | 929.55 % |
array map - filter - to | 100 % | 95.01 % | 291.28 % |
list foreach counting | 100 % | 100.07 % | 100.87 % |
list summing | 100 % | 104.40 % | 202.86 % |
list filtered summing | 100 % | 104.42 % | 627.96 % |
list mapped summing | 100 % | 100.08 % | 769.58 % |
nested list summing | 100 % | 99.52 % | 1133.47 % |
(Don't be fooled, values < 100 % are most likely not significant but I'm including them here anyways just for the giggles. π)
(Don't be fooled2, I obviously selected benchmarks in the favor of speed, if you'd like to see other comparisons please PR.)
Benchmark results can be regenerated by running the PerformanceSpecs
in the code base.
var counter = 0
for (i β 1 to 1000) counter += i * i
counter
var counter = 0
for {
i β 1 to 100
j β 1 to 100
} counter += i * j
counter
(1 to 1000).filter(_ % 3 == 0).sum
(1 to 1000).map(i β i * i).sum
var counter = 0
for (x β array) counter += x * x
counter
array.sum
array.filter(_ % 3 == 0).sum
array.map(x β x * x).sum
class Ref(var num: Int = 0)
val N = 1000
val refs = (0 until N).map(i β new Ref(i)).toArray
refs
.filter(_.num % 5 == 0)
.filter(_.num % 7 == 0)
.size
array.map(x β x * x).filter(_ % 3 == 0).to[List]
var counter = 0
for (x β list) counter += x * x
counter
list.sum
list.filter(_ % 3 == 0).sum
list.map(x β x * x).sum
(for (x β list1; y β list2) yield x * y).sum