input-output-hk / metronome   0.4.0

Apache License 2.0 GitHub

Checkpointing PoW blockchains with HotStuff BFT

Scala versions: 2.13 2.12

Metronome

Metronome is a checkpointing component for Proof-of-Work blockchains, using the HotStuff BFT algorithm.

Overview

Checkpointing provides finality to blockchains by attesting to the hash of well-embedded blocks. A proper checkpointing system can secure the blockchain even against an adversary with super-majority mining power.

The Metronome checkpointing system consists of a generic BFT Service (preferably HotStuff), a Checkpoint-assisted Blockchain, and a Checkpointing Interpreter that bridges the two. This structure enables many features, including flexible BFT choices, multi-chain support, plug-and-play forensic monitoring platform via the BFT service, and the capability of bridging trust between two different blockchains.

Architecture

BFT Service: A committee-based BFT service with a simple and generic interface. It takes consensus candidates (e.g., checkpoint candidates) as input and generates certificates for the elected ones.

Checkpoint-assisted Blockchain: Maintains the main blockchain that accepts and applies checkpointing results. The checkpointing logic is delegated to the checkpointing interpreter below.

Checkpointing Interpreter: Maintains checkpointing logic, including the creation and validation (via blockchain) of checkpointing candidates, as well as checkpoint-related validation of new blockchain blocks.

Each of these modules can be developed independently with only minor data structure changes required for compatibility. This independence allows flexibility with the choice of BFT algorithm (e.g., variants of OBFT or Hotstuff) and checkpointing interpreter (e.g., simple checkpoints or Advocate).

The architecture also enables a convenient forensic monitoring module. By simply connecting to the BFT service, the forensics module can download the stream of consensus data and detect illegal behaviors such as collusion, and identify the offenders.

Architecture diagram

Component diagram

BFT Algorithm

The BFT service delegates checkpoint proposal and candidate validation to the Checkpointing Interpreter using 2-way communication to allow asynchronous responses as and when the data becomes available.

Algorithm diagram

When a winner is elected, a Checkpoint Certificate is compiled, comprising the checkpointed data (a block identity, or something more complex) and a witness for the BFT agreement, which proves that the decision is final and cannot be rolled back. Because of the need for this proof, low latency BFT algorithms such as HotStuff are preferred.

Build

Requirements

  • Mill, a build tool for scala.
  • JDK 11

Building the project

The project is built using Mill, which works fine with Metals.

To compile everything, use the __ wildcard:

mill __.compile

The project is set up to cross build to all Scala versions for downstream projects that need to import the libraries. To build any specific version, put them in square brackets:

mill metronome[2.12.10].checkpointing.app.compile

To run tests, use the wild cards again and the .test postix:

mill __.test
mill --watch metronome[2.13.4].rocksdb.props.test

To run a single test class, use the .single method with the full path to the spec. Note that ScalaTest tests are in the specs subdirectories while ScalaCheck ones are in props.

mill __.storage.specs.single io.iohk.metronome.storage.KVStoreStateSpec
mill __.hotstuff.consensus.props.single io.iohk.metronome.hotstuff.consensus.basic.ProtocolStateProps

To experiment with the code, start an interactive session:

mill -i metronome[2.13.4].hotstuff.consensus.console

Versions

You will need Java 11 to build.

The mill version is set in the .mill-version file or the MILL_VERSION env var. To build with Nix in sandbox environment, it's best to make sure that the build works with the version that Nix comes with, because after this update it's not going to dynamically download the one set in the project.

Formatting the codebase

Please configure your editor to use scalafmt on save. CI will be configured to check formatting.

Publishing

We're using the VersionFile plugin to manage versions in a semi-manual fashion.

The initial version has been written to the file without newlines:

echo -n "0.1.0-SNAPSHOT" > versionFile/version

Builds on develop will publish the snapshot version to Sonatype, which will be overwritten if the version number is not updated with the next PR. Thus, this version should not be used in nixified builds, because the hash of the dependency will change with each PR that adds to the snapshot.

During publishing on master we will use mill versionFile.setReleaseVersion to remove the -SNAPSHOT postfix and make a release. After that the version number should be bumped on develop, e.g. mill versionFile.setNextVersion --bump minor.

With that in mind, the release process of a new feature looks as follows:

  1. Say develop is on version 0.1.0-SNAPSHOT.
  2. Create a feature branch to work on something new, reflecting the ticket number, e.g. git checkout -b pm-1001-new-feat. Prefix each commit message by the ticket number.
  3. Open a PR from pm-1001-new-feat to develop. No need to modify the version in the PR, you an accumulate as many features and fixes as you like in develop.
  4. Merge the PR into develop by squashing all commits into one, so each ticket becomes one commit in the history.
  5. Go to 2 until ready to release.
  6. To release, open a PR from develop to master.
  7. Merge the PR into master by creating a merge commit, not a squash, so the history of master preserves the individual (squashed) ticket commits.
  8. The publishing of master will push 0.1.0 to Sonatype. We should not merge more stuff into 0.1.0-SNAPSHOT.
  9. Create a branch to bump the the version on develop to 0.2.0-SNAPSHOT, e.g. git checkout develop; git checkout -b bump-0.2.0; mill versionFile.setNextVersion --bump minor.
  10. Alternatively this can be done as part of the next PR; after the first major release this will be better if you don't know if the next change is going to be backwards compatible or not.

To do a hotfix, fix the bug on master instead of doing another release from develop:

  1. Create a hotfix branch, e.g. git checkout master; git checkout -b fix-nasty-bug.
  2. Fix the bug, then run mill versionFile.setNextVersion --bump patch to increment the patch version to 0.1.1 (assuming this is the first bug after the release above).
  3. Open a PR from fix-nasty-bug to master.
  4. Merge the PR into master by squashing all commits into one.
  5. Open a PR from master to develop. Make sure the version number stays 0.2.0-SNAPSHOT, since the next feature release can incorporate the bug fix, and we don't want to push another 0.1.1 to Sonatype.
  6. Merge into develop and carry on adding features.

Example Applications

The examples module contains applications that demonstrate the software at work with simplified consensus use cases.

Robot

The robot example is about the federation moving around a fictional robot on the screen. Each leader proposes the next command to be carried out by the robot, once consensus is reached. The setup assumes 4 nodes, with at most 1 Byzantine member.

To test it, start 4 consoles and run commands like the following, $i going from 0 to 3:

./metronome/examples/robot.sh $i

The detailed logs should be in ~/.metronome/examples/robot/logs/node-$i.log, e.g.:

$ tail -f ~/.metronome/examples/robot/logs/node-0.log
14:03:03.607 ERROR i.i.m.h.s.tracing.ConsensusEvent Error {"message":"Error processing effect SendMessage(ECPublicKey(ByteVector(64 bytes, 0xcb020251d396614a35038dd2ff88fd2f1a5fd74c8bcad4b353fa605405c8b1b8c80ee12d2a10b1fca59424b16890c8115fbc94a68026369acc3c2603595e6387)),NewView(5,QuorumCertificate(Prepare,0,ByteVector(32 bytes, 0xb978b34fc4e905a727065b3e18d941a44a8349d8251514debbee5d6ddb94d430),GroupSignature(List()))))","error":"Connection with node ECPublicKey(ByteVector(64 bytes, 0xcb020251d396614a35038dd2ff88fd2f1a5fd74c8bcad4b353fa605405c8b1b8c80ee12d2a10b1fca59424b16890c8115fbc94a68026369acc3c2603595e6387)), has already closed"}
14:03:04.736 DEBUG i.i.m.networking.NetworkEvent ConnectionFailed {"key":"23fcab42e8f1078880b27aab4849092489bfa8d3e3b0faa54c9db89e89223c783ec7a3b2f8e6461b27778f78cea261a2272abe31c5601173b2964ef14af897dc","address":"localhost/127.0.0.1:40003","error":"io.iohk.scalanet.peergroup.PeerGroup$ChannelSetupException: Error establishing channel to PeerInfo(BitVector(512 bits, 0x23fcab42e8f1078880b27aab4849092489bfa8d3e3b0faa54c9db89e89223c783ec7a3b2f8e6461b27778f78cea261a2272abe31c5601173b2964ef14af897dc),localhost/127.0.0.1:40003)."}

To clear out everything before a restart, just run rm -rf ~/.metronome/examples/robot.

Running the Checkpointing Service

First generate some ECDSA keys to be used by the federation, as well as one to be used by the PoW interpreter (it has to be different from the key used by the service):

$ mill metronome[2.13.4].checkpointing.app.run keygen > service-keys.json
[424/424] metronome[2.13.4].checkpointing.app.run
$ cat service-keys.json
{
  "publicKey" : "ab5944b35a12f87133b5cf525b7a2ecc698a059b4d46898c4f58970e73069aeebeb55765ade41d781120c27ef8a88ae1cb2ff5c2e70345373b524dcfcb6636d5",
  "privateKey" : "057b39a793c06683b4ebec95456f576be4c44e4404e126f0a46689d259209a75"
}
$ mill metronome[2.13.4].checkpointing.app.run keygen > interpreter-keys.json
[424/424] metronome[2.13.4].checkpointing.app.run

The results can be parsed for example with jq, as seen in the example below.

Create a config file to provide the necessary settings which the default application.conf doesn't have. For example:

cat <<EOF >example.conf
include "/application.conf"

metronome {
  checkpointing {
    federation {
      self {
        host = $(dig +short myip.opendns.com @resolver4.opendns.com)
        port = 40000
        private-key = $(jq -r ".privateKey" service-keys.json)
      }

      # Append here other the other nodes you create.
      others = [
      ]
    }
    local {
      interpreter {
        public-key = $(jq -r ".publicKey" interpreter-keys.json)
      }
    }
  }
}
EOF

Build the service into a fat JAR so we can pass system properties when we run it:

SCALA_VER=2.13.4
ASSEMBLY_JAR=${PWD}/out/metronome/${SCALA_VER}/checkpointing/app/assembly/dest/out.jar
mill metoronme[$SCALA_VER].checkpointing.app.assembly

Start the service by pointing it at the example configuration:

$ java -cp $ASSEMBLY_JAR -Dconfig.file=example.conf io.iohk.metronome.checkpointing.app.CheckpointingApp service
13:22:02.853 WARN  i.i.m.h.s.tracing.ConsensusEvent Timeout {"viewNumber":7,"messageCounter":{"past":0,"present":0,"future":0}}
13:22:02.895 WARN  i.i.m.c.s.tracing.CheckpointingEvent InterpreterUnavailable {"messageType":"CreateBlockBodyRequest"}

Detailed logs should appear in ~/.metronome/checkpointing/logs/service.log.