Tuesday, 27 November 2012

Dynamic Code Generation using F# Monads (Workflows)

Recently I wrote a fast object serializer in F# with dynamic code generation using the .NET DynamicMethod class and its ILGenerator which allows us to generate the CIL for the method. The serializer simply enumerates all the members of an object that should be serialized, and generates code for writing these to a stream (encoded in a certain way to reduce space overhead, e.g. by using variant integers à la Google protocol buffers).

There are many ways to generate this kind of code in .NET. One way is to use Expression trees, another is to generate actual source e.g. C# and going csc on it, and yet another is to use a DynamicMethod or a dynamic assembly/type. I chose the latter as it allows me to generate exactly the IL I want to generate and I'm a bit of a masochist. To illustrate this point, emitting code to call an interface method but using the opcode Call instead of Callvirt will cause funky, seemingly inexplicable errors - anything from the runtime destabilizing to it getting an epileptic fit.

F# has support for monads, which it calls "workflows" or "computation expressions" as that is no doubt more marketable. F# workflows allow us to write F# syntax, but interpret it in a custom way. In essence F# workflows are syntactic sugar for a series of function calls. I won't go into more details here (there's plenty on the web), so let's look at how we can use this to make code generation a little more pleasant. Consider the following bit of F# code:

 asm {
      let array = asm.ILGen.DeclareLocal(arrayType)
      let ret = asm.ILGen.DefineLabel()
      // ...

      yield OpCodes.Dup
      yield OpCodes.Stloc, array
      yield OpCodes.Ldlen
      yield OpCodes.Stloc, len
      for i:LocalBuilder in (0, len) do
         yield OpCodes.Ldloc, array
         yield OpCodes.Ldloc, i
         yield OpCodes.Ldelem, elementType
 
         // ... etc.
 
      yield Label(ret)
   }

The nice thing here is that this looks a little like C++ syntax for inline assembler. It is also more readable than a lot of method calls on an ILGenerator and in particular the for loop is elegant. The for loop shown actually emits the IL for a for loop from 0 to the local len. Finally, it is particularly finicky to get this working so in theory we could build in some more safety checks than that which ILGenerator provides (however we haven't done this).

asm in the code above is an F# computation expression backed by the following builder type:

type EmitBuilder(ilgen: ILGenerator) =
   member this.ILGen = ilgen
   member this.Yield (c: EmitOp) = 
      match c with 
      | Call m  -> ilgen.Emit(OpCodes.Callvirt, m)
      | Label l -> ilgen.MarkLabel(l)
      | Goto l  -> ilgen.Emit(OpCodes.Br, l)
 
   member this.Yield (u:unit) = u
   member this.Yield (o: OpCode) = ilgen.Emit(o)
   member this.Yield (inp: OpCode * int32) = match inp with (a, b) -> ilgen.Emit(a, b)
   member this.Yield (inp: OpCode * LocalBuilder) = match inp with (a, b) -> ilgen.Emit(a, b)
   member this.Yield (inp: OpCode * Type) = match inp with (a, b) -> ilgen.Emit(a, b)
   member this.Yield (inp: OpCode * FieldInfo) = match inp with (a, b) -> ilgen.Emit(a, b)
   member this.Yield (inp: OpCode * MethodInfo) = match inp with (a, b) -> ilgen.Emit(a, b)
   member this.Yield (inp: OpCode * Label) = match inp with (a, b) -> ilgen.Emit(a, b)
 
   member this.Delay (x: unit -> unit) = x
   member this.Run x = x()
   member this.Combine (x, y) = y()
 
   member this.Zero (u:unit) = ()
 
   member this.TryFinally (body: unit -> unit, final: unit -> unit) = 
      try
         ilgen.BeginExceptionBlock() |> ignore
         body()
      finally
         ilgen.BeginFinallyBlock()
         final()
         ilgen.EndExceptionBlock()
    
   member this.For (exp: 'a * 'b, body: LocalBuilder -> unit) =
      let start = ilgen.DefineLabel()
      let finish = ilgen.DefineLabel()
      let loopVar = ilgen.DeclareLocal(typeof<int32>)
      let (from, length) = exp
      this {
         match box from with
         | :? int32 as i -> ilgen.Emit(OpCodes.Ldc_I4, i)
         | :? LocalBuilder as l -> ilgen.Emit(OpCodes.Ldloc, l)
         | _ -> failwith "invalid from type"
 
         yield OpCodes.Stloc, loopVar

         // etc. 

Similarly we can add support for if, while and other F# language constructs.

How does this work?

F# translates the code within the curly braces to method calls on the asm EmitBuilder instance. The code above translates to something along the lines of:

asm.Run( 
  asm.Delay(
fun() ->   
     asm.Combine(
       
        asm.Yield(OpCodes.Dup), 
       
        asm.Delay(
fun () -> ...           
           asm.For((0, len), 
fun i -> ...


Clearly this isn't very nice to write, let alone nice to read. The downside of our monadic sugar may be that it isn't always obvious exactly what is going on where.

For your perusal I put this code on Github, find it here.

No comments:

Post a Comment