The Implementation of Michelson

Reading the Michelson interpreter

This is a series of posts I'll be making on the general structure of the Michelson interpreter and Tezos itself. These posts assume a working knowledge of OCaml, but not a huge degree of familiarity with GADTs or some of the other advanced features. This series may be come out of date as Tezos develops, so I'll be covering the parts of the language I think are less likely to change first. Each post has an attached date. The information contained therein is accurate as of that date, but may have become inaccurate in the intervening time. If you notice discrepancies, please get in touch. I'm happy to make corrections.

Part I: The Error Monad (October 6, 2017)

This probably wasn't where you were expecting me to start, was it? Well, I've picked it for a few reasons:

  1. I'm fairly confident that this part of the codebase will remain static until after the launch.
  2. People who want to understand the Tezos codebase but don't care about Michelson can still benefit from understanding this part of the codebase.
  3. It will become important later.
  4. It's elegant.

If you're not familiar with monads, go take a few minutes and read a tutorial. I personally got a lot out of this paper by Philip Wadler, but there are a ton of others available online. Find one that works for you. The error monad isn't terribly scary as Monads go, so once you feel like you understand the gist, come on back and see if you can understand what's going on.

I'm going to omit some convenience operations that a lot of monads provide in the examples below. If you want to add them, they're not difficult.

Why you want the error monad

In Tezos, we don't want to have the node be crashable by an improper input. To avoid this possibility, the lead developers decided that the system should not use exceptions for error handling. Instead, it uses an error monad. This design forces errors to be handled or carried through before an output can be used. Exceptions are still occasionally used, but this is mostly in the client and only for internal errors.

We also mix in the Lwt library, which we use for concurrency. This is combined with the error monad and is once again used pervasively throughout the codebase. The Lwt monad is a lot like promises in other languages.

Without further ado, let's write an error monad.

A simple version of the error monad

Here's a very simple error monad.

module Error : sig
  type 'a t
  (* Create a value of type t *)
  val return : 'a -> 'a t
  (* For when a computation fails *)
  val error : 'a t
  (* Apply an operation to a value in the error monad *)
  val (>>?) : 'a t -> ('a -> 'b t) -> 'b t (* bind *)
end = struct
  type 'a t = Ok of 'a | Error
  let return x = Ok x
  let error = Error
  let (>>?) value func =
    match value with
    | Ok x -> func x
    | Error -> Error
end

So, is this what Tezos uses? We actually already have a lot of the structure that we'll use later. The basic idea is that you return a value that's correct and return an error if the operation failed. Outside of the error module, you can't actually introspect an error value. You can only dispatch on the correctness/incorrectness of the value using bind.

What's wrong here?

  • We can't report any information about an error case
  • We can't report error traces, something that's used to improve the quality of error messages throughout Tezos
  • We can't handle some errors and continue executing

A slight improvement

Let's now enhance our error reporting by allowing errors to contain a description string. Now we can report messages along with our errors. Is this enough of an improvement? Not really. We don't have any flexibility about how the printing works. We still can't create error traces and we can't handle errors and resume executing the program.

module Error : sig
  type 'a t
  val return : 'a -> 'a t
  val error : string -> 'a t
  val (>>?) : 'a t -> ('a -> 'b t) -> 'b t (* bind *)
  val print_value : ('a -> string) -> 'a t -> unit
end = struct
  type 'a t = Ok of 'a | Error of string
  let return x = Ok x
  let error s = Error s
  let (>>?) value func =
    match value with
    | Ok x -> func x
    | Error s -> Error s
  let print_value func = function
    | Ok x -> Printf.printf "Success: %s\n" (func x)
    | Error s -> Printf.printf "Error: %s\n" s
end

Traces

Now that we have the basic structure down, we can add a mechanism to let us include traces. As a note, the error type I had above is exactly the result type from the OCaml standard library. The traces are just lists of error messages. If you have a call you think might fail, you can wrap that result in the trace function. If that call fails, an additional error is added.

module Error : sig
  type 'a t
  val return : 'a -> 'a t
  val error : string -> 'a t
  val (>>?) : 'a t -> ('a -> 'b t) -> 'b t (* bind *)
  val print_value : ('a -> string) -> 'a t -> unit
  val trace : string -> 'a t -> 'a t
end = struct
  type 'a t = ('a, string list) result
  let return x = Ok x
  let error s = Error [ s ]
  let (>>?) value func =
    match value with
    | Ok x -> func x
    | Error errs -> Error errs
  let print_value func = function
    | Ok x -> Printf.printf "Success: %s\n" (func x)
    | Error [ s ] -> Printf.printf "Error: %s\n" s
    | Error errors -> Printf.printf "Errors:\t%s\n" (String.concat "\n\t" errors)
  let trace error = function
    | Ok x -> Ok x
    | Error errors -> Error (error :: errors)
end

A more descriptive message

Even though traces are nice, we really want to be able to store more interesting data in the messages. We're going to use an extensible variant type to do this. Extensible variants allow us to add a new case to a variant type at the cost of exhaustivity checking. We're going to need two new mechanisms to make this work well. The first is an error registration scheme. In the actual error monad, this involves the data encoding module, which is how all data is encoded/decoded in Tezos. This module is another decently complicated part of the codebase that should probably the subject of a future post. Since you can declare arbitrary new errors, we'll have a way of adding a printer for each error.

When we add a new error handler, we'll use the register_handler function. This function will take a function that takes an error and returns a string option. These functions will look something like this:

type error += Explosion_failure of string * int;;

register_error
  (function
    | Explosion_failure (s, i) ->
        Some (Printf.sprintf "Everything exploded: %s at %d" s i)
    | _ -> None)

I'm also renaming the error function to fail. This is the convention used by the actual Errormonad module. I'm also exposing the ='a t= type so that you can dispatch on it if you need to. This is used several times in the Tezos codebase.

module Error : sig
  type error = ..
  type 'a t = ('a, error list) result
  val return : 'a -> 'a t
  val fail : error -> 'a t
  val (>>?) : ('a -> 'b t) -> 'a t -> 'b t (* bind *)
  val print_value : ('a -> string) -> 'a t -> unit
  val trace : error -> 'a t -> 'a t
end = struct
  type error = ..
  type 'a t = ('a, error list) result
  let fail error = Error [ error ]
  let return x = Ok x
  let (>>?) func = function
    | Ok x -> func x
    | Error errs -> Error errs
  let registered = ref []
  let register_error handler =
    registered := (handler :: !registered)
  let default_handler error =
    "Unregistered error: " ^ Obj.(extension_name @@ extension_constructor error)
  let to_string error =
    let rec find_handler = function
      | [] -> default_handler error
      | handler :: handlers ->
          begin match handler error with
            | None -> find_handler handlers
            | Some s -> s
          end
    in find_handler !registered
  let print_value func = function
    | Ok x -> Printf.printf "Success: %s\n" (func x)
    | Error [ s ] -> Printf.printf "Error: %s\n" (to_string s)
    | Error errors -> Printf.printf "Errors:\t%s\n" (String.concat "\n\t" (List.map to_string errors))
  let trace error = function
    | Ok x -> Ok x
    | Error errors -> Error (error :: errors)
end

Putting Lwt.t in the mix

Tezos uses the Lwt library for threading. The Lwt monad is mixed in with the error monad module. This requires us to add some extra combinators and reexport some functions from Lwt.

I'm also renaming the type t to tzresult, as used in the Tezos codebase.

module Error : sig
  type error = ..
  type 'a tzresult = ('a, error list) result
  val ok : 'a -> 'a tzresult
  val return : 'a -> 'a tzresult Lwt.t
  val error : error -> 'a tzresult
  val fail : error -> 'a tzresult Lwt.t
  val (>>?) : 'a tzresult -> ('a -> 'b tzresult) -> 'b tzresult (* bind *)
  val (>>=?) : 'a tzresult Lwt.t -> ('a -> 'b tzresult Lwt.t) -> 'b tzresult Lwt.t
  val (>>=) : 'a Lwt.t -> ('a -> 'b Lwt.t) -> 'b Lwt.t
  val print_value : ('a -> string) -> 'a tzresult Lwt.t -> unit Lwt.t
  val trace : error -> 'a tzresult Lwt.t -> 'a tzresult Lwt.t
end = struct
  type error = ..
  type 'a tzresult = ('a, error list) result
  let fail error = Lwt.return (Error [ error ])
  let error error = (Error [ error ])
  let ok x = Ok x
  let return x = Lwt.return (ok x)
  let (>>?) value func =
    match value with
    | Ok x -> func x
    | Error errs -> Error errs
  let (>>=) = Lwt.bind
  let (>>=?) value func =
    value >>= function
    | Ok x -> func x
    | Error errs -> Lwt.return (Error errs)
  let registered = ref []
  let register_error handler =
    registered := (handler :: !registered)
  let default_handler error =
    "Unregistered error: " ^ Obj.(extension_name @@ extension_constructor error)
  let to_string error =
    let rec find_handler = function
      | [] -> default_handler error
      | handler :: handlers ->
          begin match handler error with
            | None -> find_handler handlers
            | Some s -> s
          end
    in find_handler !registered
  let print_value func value =
    value >>= fun value ->
    begin match value with
      | Ok x -> Printf.printf "Success: %s\n" (func x)
      | Error [ s ] -> Printf.printf "Error: %s\n" (to_string s)
      | Error errors -> Printf.printf "Errors:\t%s\n" (String.concat "\n\t" (List.map to_string errors))
    end; Lwt.return ()
  let trace error value =
    value >>= function
    | Ok x -> return x
    | Error errors -> Lwt.return (Error (error :: errors))
end

That's all folks

Bam, that's the basic idea. The actual error monad in Tezos is a bit more complicated, but I've defined almost all of the functions that are commonly used. The largest differences are in the functions that deal with error registration. The Tezos compiler also splits the error monad between the client and node. Once again, this doesn't matter too much when you're reading the Michelson code.

I'll be back in part two, where we'll be looking at Michelson's typed AST.

Part II: The Typed AST (October 16, 2017)

Michelson is strongly typed. As one might expect, we typecheck a program before we run it. However, we go further. THe syntax tree itself is typed and catches a lot of bugs. For example, I recently added a few extra loop instructions to Michelson (they're not yet on master, but should be soon) and the code came directly from the types in the AST. This excellent design is due to Benjamin Canou, the architect of Michelson.

What's a GADT?

I wasn't overly familiar with GADTs before I came to Paris. Benjamin has done a lot to help explain the limitations and power of GADTs.

So, what is a GADT? It's like a dependent type, but weaker. A dependent type allows you to put the value in the type. With dependent types, you can say that a function only takes numbers greater than 3 or that a tree must be balanced. You can say that