Minimalist Akka Observability

Non-intrusive native Prometheus collectors for Akka internals, negligible performance overhead, suitable for production use.

Are you running (or about to run) Akka in production, full-throttle, and want to see what happens inside? Did your load tests produce some ask timeouts? thread starvation? threads behaving non-reactively? old code doing nasty blocking I/O?
Would be nice to use Cinnamon Telemetry, but LightBend subscription is out of reach?
Overhead created by Kamon doesn't look acceptable, especially when running full-throttle?
Already familiar with Prometheus/Grafana observability stack?

If you answer 'yes' to most of the questions above, Akka Sensors may be the right choice for you:

Comprehensive feature set to make internals of your Akka visible, in any environment, including high-load production.
It is OSS/free, as in MIT license, and uses explicit, very lightweight instrumentation - yet is a treasure trove for a busy observability engineer.
Won't affect CPU costs, when running in public cloud.
Easy Demo/Evaluation setup included: Akka with Cassandra persistence, Prometheus server and Grafana dashboards.

Actor dashboard:

Dispatcher dashboard:

Features

Dispatchers

time of runnable waiting in queue (histogram)
time of runnable run (histogram)
implementation-specific ForkJoinPool and ThreadPool stats (gauges)
thread states, as seen from JMX ThreadInfo (histogram, updated once in X seconds)
active worker threads (histogram, updated on each runnable)

Thread watcher

thread watcher, keeping eye on threads running suspiciously long, and reporting their stacktraces - to help you find blocking code quickly

Basic actor stats

number of actors (gauge)
time of actor 'receive' run (histogram)
actor activity time (histogram)
unhandled messages (count)
exceptions (count)

Persistent actor stats

recovery time (histogram)
number of recovery events (histogram)
persist time (histogram)
recovery failures (counter)
persist failures (counter)

Cluster

cluster events, per type/member (counter)

Cassandra

Instrumented Cassandra session provider, exposing Cassandra client metrics collection.

requests
traffic in/out
timeouts

Java Virtual Machine (from Prometheus default collectors)

number of instances
start since / uptime
JVM version
memory pools
garbage collector

Demo setup

We assuming you have docker and docker-compose up and running.

Prepare sample app:

sbt "compile; project app; docker:publishLocal"

Start observability stack:

docker-compose -f examples/observability/docker-compose.yml up

Send some events:

for z in {1..100}; do curl -X POST http://localhost:8080/api/ping-fj/$z/100; done
for z in {101..200}; do curl -X POST http://localhost:8080/api/ping-tp/$z/100; done
for z in {3001..3300}; do curl -X POST http://localhost:8080/api/ping-persistence/$z/300 ; done

Open Grafana at http://localhost:3000.

Go to http://localhost:3000/plugins/sensors-prometheus-app, click Enable. Sensors' bundled dashboards will be imported.

Usage

SBT dependency

libraryDependencies ++= 
  Seq(
     "nl.pragmasoft.sensors" %% "sensors-core" % "0.2.2",
     "nl.pragmasoft.sensors" %% "sensors-cassandra" % "0.2.2"
  )

Prometheus exporter

If you already have Prometheus exporter in your application, CollectorRegistry.defaultRegistry will be used by default. To control this finely, AkkaSensors.prometheusRegistry needs to be overridden.

For an example of HTTP exporter service, check MetricService implementation in example application (app) module.

Application configuration

Override type and executor with Sensors' instrumented executors. Add akka.sensors.AkkaSensorsExtension to extensions.

akka {

  actor {

    # main/global/default dispatcher

    default-dispatcher {
      type = "akka.sensors.dispatch.InstrumentedDispatcherConfigurator"
      executor = "akka.sensors.dispatch.InstrumentedExecutor"

      instrumented-executor {
        delegate = "fork-join-executor" 
        measure-runs = true
        watch-long-runs = true
        watch-check-interval = 1s
        watch-too-long-run = 3s
      }
    }

    # some other dispatcher used in your app

    default-blocking-io-dispatcher {
      type = "akka.sensors.dispatch.InstrumentedDispatcherConfigurator"
      executor = "akka.sensors.dispatch.InstrumentedExecutor"

      instrumented-executor {
        delegate = "thread-pool-executor"
        measure-runs = true
        watch-long-runs = false
      }
    }
  }

  extensions = [
    akka.sensors.AkkaSensorsExtension
  ]
}

Using explicit/inline executor definition

akka {
 persistence {
  cassandra {
   default-dispatcher {
        type = "akka.sensors.dispatch.InstrumentedDispatcherConfigurator"
        executor = "akka.sensors.dispatch.InstrumentedExecutor"

        instrumented-executor {
          delegate = "fork-join-executor"
          measure-runs = true
          watch-long-runs = false
        }

        fork-join-executor {
          parallelism-min = 6
          parallelism-factor = 1
          parallelism-max = 6
        }
      }
    }
  }
}

Actors (classic)

 # Non-persistent actors
 class MyImportantActor extends Actor with ActorMetrics {

    # This becomes label 'actor', default is simple class name
    # but you may segment it further
    # Just make sure the cardinality is sane (<100)
    override protected def actorTag: String = ... 

      ... # your implementation
  }

 # Persistent actors
 class MyImportantPersistentActor extends Actor with PersistentActorMetrics {
  ...

Actors (typed)

val behavior = BehaviorMetrics[Command]("ActorLabel") # basic actor metrics
    .withReceiveTimeoutMetrics(TimeoutCmd) # provides metric for amount of received timeout commands
    .withPersistenceMetrics # if inner behavior is event sourced, persistence metrics would be collected
    .setup { ctx: ActorContext[Command] =>
      ... # your implementation
    }

Internal parameters

Some parameters of the Sensors library itself, that you may want to tune:

akka.sensors {
  thread-state-snapshot-period = 5s
  cluster-watch-enabled = false
}

Additional metrics

For anything additional to measure in actors, extend *ActorMetrics in your own trait.

trait CustomActorMetrics extends ActorMetrics  with MetricsBuilders {

  val importantEvents: Counter = counter
    .name("important_events_total")
    .help(s"Important events")
    .labelNames("actor")
    .register(metrics.registry)

}

Why codahale is used alongside Prometheus?

We would prefer 100% Prometheus, however Cassandra Datastax OSS driver doesn't support Prometheus collectors. Prometheus is our preferred main metrics engine, hence we brigde metrics from Codahale via JMX. This won't be needed anymore if Prometheus would be supported natively by Datastax driver.

jacum / akka-sensors 0.10.0