Version Matrix

NeuroFlow is a library to design, train and evaluate Artificial Neural Networks.

Getting Started

The library aims at ease of use, keeping things intuitive and simple. Neural Nets bring joy into your project, a journey full of experiments. :o)

There are three modules:

  • core: building blocks for neural networks
  • application: plugins, helpers, functionality related to application
  • playground: examples how to use it

To use NeuroFlow for Scala 2.12.x, add these dependencies to your SBT project:

libraryDependencies ++= Seq(
  "com.zenecture"   %%   "neuroflow-core"          %   "1.7.8",
  "com.zenecture"   %%   "neuroflow-application"   %   "1.7.8"

resolvers ++= Seq(
  "neuroflow-libs" at ""

If you are new to the math behind Neural Nets, you can read about the core principles here:

Seeing code examples is a good way to get started. You may have a look at the playground for basic inspiration.

Construction of a Net

Let's construct the fully connected feed-forward net (FFN) depicted above.

import neuroflow.core.Activators.Double._
import neuroflow.core._
import neuroflow.dsl._
import neuroflow.nets.cpu.DenseNetwork._

implicit val weights = WeightBreeder[Double].normal(μ = 0.0, σ = 0.1)

val (g, h) = (Sigmoid, Sigmoid)

val net = Network(
  layout = Vector(2) :: Dense(3, g) :: Dense(1, h) :: SquaredError()

This gives a fully connected DenseNetwork under the SquaredError loss function, running on CPU. The weights are drawn from normal distribution by WeightBreeder. We have predefined activators and place a softly firing Sigmoid on the cells.

A full model is expressed as a linear Layout graph and a Settings instance. The layout is implemented as a heterogenous list, allowing compile-time checks for valid compositions. For instance, a little deeper net, with some rates and rules defined, could look like this:

val (e, f) = (Some(Linear.biased(0.1)), ReLU)

val L =
      Vector   (11, e)      ::
      Dense    ( 3, f)      ::
      Dense   (180, f)      ::
      Dense   (360, f)      ::
      Dense   (420, f)      ::
      Dense   (314, f)      ::
      Dense   (271, f)      :: 
      Dense    (23, f)      ::    SoftmaxLogEntropy()

val deeperNet = Network(
  layout = L, 
  settings = Settings[Double](
    updateRule = Vanilla(), 
    batchSize = Some(8), 
    iterations = 256,
    learningRate = { 
      case (iter, α) if iter < 128 => 1E-4
      case (_, _)  => 1E-6
    precision = 1E-8

Here, the SoftmaxLogEntropy layer computes loss and gradient, which is backpropped into the last Dense layer of the net. The updateRule defines how weights are updated for gradient descent. The batchSize defines how many samples are presented per weight update. The learningRate is a partial function from current iteration and learning rate producing a new learning rate. Training terminates after iterations, or if loss satisfies precision.

Another important aspect of the net is its numerical type. For example, on the GPU, you might want to work with Float instead of Double. The numerical type is set by explicitly annotating it on both the WeightBreeder and Settings instances.

Have a look at the Settings class for the complete list of options.


Our small net is a function f: X -> Y. It maps from 2d-vector X to 1d-vector Y. There are many functions of this kind to learn out there. Here, we go with the XOR function. It is linearly not separable, so we can check whether the net can capture this non-linearity.

To learn, we need to know what it means to be wrong. The SquaredError loss function is defined as follows:

SquaredError(X, Y, W) = Σ1/2(Y - net(X, W))²

Where W are the weights, Y is the target and net(X, W) the prediction. The sum Σ is taken over all samples and the square ² gives a convex functional form. We interpret the XOR-adder as a regression challenge, so the SquaredError is our choice. Alternatively, for 1-of-K classification, we could use the SoftmaxLogEntropy loss function, for N-of-K SoftmaxLogMultEntropy respectively.

Example: Derivative for w8

In NeuroFlow, we work with Breeze, in particular with DenseVector[V] and DenseMatrix[V]. Let's define the XOR training data using in-line vector notation:

import neuroflow.application.plugin.Notation._

val xs = Seq(->(0.0, 0.0), ->(0.0, 1.0), ->(1.0, 0.0), ->(1.0, 1.0))
val ys = Seq(->(0.0), ->(1.0), ->(1.0), ->(0.0))

  It's the XOR-Function :-).
  Or: the net learns to add binary digits modulo 2.

net.train(xs, ys)

And then we call train on net to start.


The training progress is printed on console so we can track it.

[run-main-0] INFO neuroflow.nets.cpu.DenseNetworkDouble - [14.01.2018 22:26:56:188] Training with 4 samples, batch size = 4, batches = 1 ...
[run-main-0] INFO neuroflow.nets.cpu.DenseNetworkDouble - [14.01.2018 22:26:56:351] Iteration 1.1, Avg. Loss = 0,525310, Vector: 0.5253104527125074  
[run-main-0] INFO neuroflow.nets.cpu.DenseNetworkDouble - [14.01.2018 22:26:56:387] Iteration 2.1, Avg. Loss = 0,525220, Vector: 0.5252200280272876  

One line is printed per iteration, Iteration a.b where a is the iteration count, b is the batch and Avg. Loss is the mean of the summed batch loss Vector. The batch count b loops, depending on the batch size, whereas the iteration count a progresses linearly until training is finished.

Rolling over all samples with mini-batches (M:N).

To visualize the loss function, we can iteratively append the Avg. Loss to a file with LossFuncOutput.

  lossFuncOutput = Some(LossFuncOutput(file = Some("~/NF/lossFunc.txt")))

Now we can use beloved gnuplot:

gnuplot> set style line 1 lc rgb '#0060ad' lt 1 lw 1 pt 7 ps 0.5 
gnuplot> plot '~/NF/lossFunc.txt' with linespoints ls 1

Another way to visualize the training progress is to use the LossFuncOutput with a function action: Double => Unit.

  lossFuncOutput = Some(LossFuncOutput(action = Some(loss => sendToDashboard(loss))))

This function gets executed in the background after each iteration, using the Avg. Loss as input. One example is sending the loss to a real-time TV dashboard.

Useful JVM args

-Dorg.slf4j.simpleLogger.defaultLogLevel=debug # for misc runtime sys infos and gpu memory
-Xmx24G # example to increase heap size


When training is done, the net can be evaluated like a regular function:

val x = ->(0.0, 1.0)
val result = net(x)
println(result) // DenseVector(0.9940081702899719)

The resulting vector has dimension = 1, as specified for the XOR-example.


We can put focus on a layer and use it as the actual model output, instead of the last layer. For instance, here is a simple AutoEncoder ae:

import neuroflow.dsl.Implicits._

val b = Dense(5, Linear)
val L = Vector(23) :: b :: Dense(23, Sigmoid) :: AbsCubicError()
val ae = Network(layout = L)

ae.train(xs, xs)

It learns the input identity, but we are interested in the 5-dimensional activation from bottleneck layer b to produce a simple, compressed version of the input.

val focused = ae focus b 
val result = focused(->(0.1, 0.2, ..., 0.23))
println(result) // DenseVector(0.2, 0.7, 0.1, 0.8, 0.2)

The focus on layer b gives a function, which can be applied just like the net it stems from. The type signature of the function is derived from the focused layer's algebraic type.

Another scenario where focusing is useful is when weights are initialized, i. e. the activations of the layers can be watched and adjusted to find good values, if a JVM debugger can't be attached.



A neural net consists of matrix multiplications and function applications. Since matrix multiplication is inherently linear here, all non-linearity has to come from the cells activators. The predefined ones are common and should be sufficient for most data, but at times special functions are required. Here is an example how to define your own:

val c = new Activator[Double] {
  val symbol = "My linear activator"
  def apply(x: Double): Double = x + 0.1
  def derivative(x: Double): Double = 1.0

Then just drop it into a layer, e. g. Dense(3, c). Luckily, the CPU implementation is flexible to run arbitrary code. If you need custom activators for GPU, you need to fork NF and implement them in CUDA.

Loss Functions

You can tackle a lot of challenges by using the predefined loss functions. However, as with the activators, at times you need your own loss function, Y, X -> Loss, Gradient, so here is how to write one, for both CPU and GPU:

val myLoss = new LossFunction[Double] {
  def apply(y: DenseMatrix[Double], x: DenseMatrix[Double])(implicit /* operators ... */): (DenseMatrix[Double], DenseMatrix[Double]) = (loss, gradient) // CPU
  def apply(y: CuMatrix[Double], x: CuMatrix[Double])(implicit /* operators ... */): (CuMatrix[Double], CuMatrix[Double]) = (loss, gradient) // GPU

The targets y and predictions x are given input to produce loss and gradient, which will be backpropagated into the raw output layer. The batch layout is row-wise, so you need to work with the matrices accordingly. Don't fear the long implicit parameters when implementing the trait, these operators come from Breeze and should be just fine. Also look at the predefined loss functions for a starting point how to work with them.

implicit def evidence[P <: Layer, V]: (P :: myLoss.type) EndsWith P = new ((P :: myLoss.type) EndsWith P) { }

To use your loss function with the front-end DSL, you need to provide evidence for compile time checks.

val L = Vector(3) :: Dense(3, Tanh) :: Dense(3, Tanh) :: myLoss
val net = Network(layout = L, settings)

Alternatively, you can work with the corresponding net implementation, passing the layout directly without any checks.

Using GPU

If your graphics card supports nVidia's CUDA (Compute Capability >= 3.0), you can train nets on the GPU, which is recommended for large nets with millions of weights and samples. On the contrary, smaller nets are faster to train on CPU, because while NeuroFlow is busy copying batches between host and GPU, CPU is already done.

To enable the GPU, you have to install the CUDA driver and toolkit (0.8.x). Example for Linux (Ubuntu 16.04):

curl -O
sudo dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
sudo apt-get install cuda-toolkit-8-0

With both driver and toolkit installed, you can import a GPU implementation for your model:

import neuroflow.nets.gpu.DenseNetwork._


We can save and load weights from nets with neuroflow.application.plugin.IO.

import neuroflow.application.plugin.IO._

val file = "/path/to/"
implicit val weights = File.weightBreeder[Double](file)
val net = Network(layout, settings)
File.writeWeights(net.weights, file)
val json = Json.writeWeights(net.weights)

The implicit weights to construct net come from File.weightBreeder. To save the weights back to file, we use File.writeWeights. The weight matrices are encoded in binary format.

To write into a database, we can use Json.writeWeights to retrieve a raw JSON string and fire a SQL query with it.


  waypoint = Some(Waypoint(nth = 3, (iter, weights) => File.writeWeights(weights, s"weights-iter-$")))

It is good practice to use the Waypoint[V] option for nets with long training times. The training process can be seen as an infinitely running wheel, and with waypoints we can harvest weights now and then to compute intermediate results. Another reason to use it is when something crashes, saved weights allow continuation of training from a recent point. Here, every nth = 3 step, the waypoint function is executed, receiving as input iteration count and a snapshot of the weights, which is written to file using File.writeWeights.