Jeff Ward
Mobile, Web, Linux, and other cool Tech
find me at
Simbulus Consulting Stack Overflow Hacker Rank GitHub LinkedIn Twitter Google Plus

A Beginner's Guide to Hacking Haxe Macros

Jun 14, 2016 #Haxe#Macros#Development

Haxe macros are described on Haxe's website as "the most advanced feature of Haxe." But the topic is vast and complex. Even with the excellent set of macro examples in the Haxe cookbook, it can be daunting just to grasp the subtlties of working with macros. This article aims to give you the tools, tips, and links necessary so you can grok the examples and get hacking away at Haxe macros.

Disclaimer: I know a very small amount about macros and the Haxe compiler. The thoughts below are insights from my own struggle to work with macros. You may already know these insights, and I may not be technically accurate on all points. But hopefully this article will help some readers make the same critical connections I did while learning this stuff. And feel free to suggest corrections or improvements in the comments!

Expressions and Type, No Instances or Values

The core concept of macros is expressions. As has been stated elsewhere, all Haxe code is expressions. So effectively, macros are about generating, inspecting, and passing around Haxe code.

When a macro function is called, it doesn't return a value like normal functions -- instead the call gets replaced by the actual Haxe code (aka, expressions) returned from the macro. The Haxe manual calls this a syntax transform (and these are technically expression macros. There are other types of macros as we'll see later.)

Macros can achieve this syntax transform because it all happens at compile-time. And unlike pre-processing your source code with some text replacement tool, macros are built into the compiler and have full access to the type system. So there are expressions and type, but not instances or values.

In other words, if you were to pass Math.PI to a regular function, it would get the Float value 3.1415. But if you pass Math.PI to a macro function, you're not passing a value. You're passing an expression that literally says "For the identifier called Math, lookup the field PI". You might as well pass in FooBar.Monkey -- that's a perfectly valid and structurally identical expression. "For the identifier FooBar, lookup the field Monkey."

However, there is also type and context information at compile time. The difference between Math.PI and FooBar.Monkey is, in the context of where those expressions were in the code, that is, where the macro was called from, the Math identifier is defined and Math.PI can be determined to be a Float, whereas FooBar is an unknown identifier.

The Guts of an Expression

If you've ever cracked open a macro and did trace(e) on one of those expressions, you might have seen something like this:

Main.hx:6: { expr => ECall({ expr => EConst(CIdent(trace)), pos => #pos(Main.hx:6: characters 17-22) },[{ expr => EConst(CString(Hello World!)), pos => #pos(Main.hx:6: characters 23-37) }]), pos => #pos(Main.hx:6: characters 17-38) }

Ugh. This mess is an enum representation of the Abstract Syntax Tree for the simple statement: trace('Hello World!') It's actually pretty cool that all Haxe code can be represented that way: in a tree structure encoded in an enum. It makes perfect sense to the compiler, but they're a real pain to work with. But fear not -- while expressions are tedious, Haxe provides tools to help, and knowing your way around these tools is the key to working with macros.

Generating Expressions

Since your macro function must return an expression, you've got to know how to generate them. You could type out all those ugly EConst(CIdent(...)) enums by hand, but thankfully Haxe provides two mechanisms to avoid that tedium: Context.parse() and the macro keyword.

Let's take a quick look at Context.parse() first. It takes a String of a Haxe expression, parses it into an AST, and returns an expression enum. And Haxe's String Interpolation makes working with string templates pretty easy. So this first example should be fairly straightforward:

import haxe.macro.Context;
class Main
{
  public static function main() { trace_build_age(); }

  public static macro function trace_build_age()
  {
    var build_time = Math.floor( Date.now().getTime()/1000 );

    var code = '{
      var run_time = Math.floor( Date.now().getTime()/1000 );
      var age = run_time - $build_time;
      trace("Right now it\'s "+run_time+", and this build is "+age+" seconds old");
    }';

    return Context.parse(code, Context.currentPos());
  }
}

There are a few key things going on here: the variable build_time gets the current timestamp when the macro is executed, at compile time. The code variable is a string (I used a multi-line string for formatting) that is comprised of the statements that I want this macro to return. Inside this code, there is another timestamp computation -- but remember, the code inside the string is going to be inserted into my program where I call this macro. So the run_time variable will be populated when the code runs. Using string interpolation, the expression to calculate the age of the build will have the build_time injected as a literal integer, for example:

var age = run_time - 1465762288;

And finally, the runtime code traces the current timestamp and the age of the build in seconds. For debug purposes, you can also trace your code before parsing and returning it.

You might have noticed that Context.parse actually takes two parameters, the second containing position data. This allows the compiler to throw errors that report sensical positions in the source code. Context.currentPos() is the position where the macro was called.

You might have also noticed that there are { } around my expressions in the code String. This is because Context.parse takes exactly one single expression, so to group and parse multiple expressions, I used a block (a single expression that contains an array of expressions.)

And since we're on the topic of { } brackets, be aware that they may be omitted in places where you might expect them -- a single expression doesn't need to be surrounded by brackets. This can make for some weird looking valid syntax, like the following:
function foo() trace('Look ma, no brackets');

Wait, why don't I see any string parsing in the examples?

I showed the string parsing example first, because it's abundantly clear what's happening. It separates the code which is executed at compile time (that outside the string) from the code that is returned and executed at runtime (the code inside the string.) And with a string, it's obvious that you can generate and manipulate code in absolutely any way you like.

But string parsing introduces issues of escape sequences, and you may want to use string interpolation inside your generated code, so it'd get messy quickly.

Now watch this: the macro keyword does exactly the same thing as Context.parse (only with no strings attached, pardon the pun.) It consumes the following expression and instead of running it, parses it into an Expr enum value.

  public static macro function trace_build_age()
  {
    var build_time = Math.floor( Date.now().getTime()/1000 );

    var e = macro {
      var run_time = Math.floor( Date.now().getTime()/1000 );
      var age = run_time - $v{ build_time };
      trace("Right now it's "+run_time+", and this build is "+age+" seconds old");
    };

    return e;
  }

Now that's pretty slick. You can basically just write the code that you want to generate, slap the macro keyword in front of it (and a containing { } block if it's more than one expression), and you're generating expressions like crazy!

Achievement unlocked: You've just discovered expression reification. It provides an expr construction capability similar to Context.parse(), and like string interpolation, there are various constructs available for templatizing the output. Notice the $v{ } above, which you can think of it as "breaking out" of the macro keyword and using the value of the build_time variable we setup. You can access a variable field on an object using obj.$fieldName. There are other reification $ constructs with different uses - I don't yet grok where I'd use all of them, but now that you and I get the general idea, when we see these $ in macro examples, we know where to look for info.

Technically you can generate types and classes with reification, too -- see the Haxe manual reification page for more details.

These are the very basics of generating expressions - with a good grasp of these concepts, you're on your way to understanding and wielding Haxe macros. But now on to...

Inspecting and Matching Expressions

In the examples we've used so far, we've simply generated and returned expressions. While this is critical, it's only part of the macro story. Macros also accept expressions and operate on class fields and other enums, so in most cases you'll need to inspect them and their various data members.

Let's jump right in and inspect an expression by tracing it and its type:

import haxe.macro.Context;
import haxe.macro.Expr;

using haxe.macro.ExprTools;

class Main
{
  public static function main()
  {
    var x = 4;
    inspect(x);
    inspect(x+0.5);
    inspect(x+"");
  }

  static macro function inspect(expr:Expr) {
    trace("----------> "+expr.toString());
    trace("expr: "+expr);
    var type = Context.typeof(expr); // a Type enum
    trace("type: "+type);
    return expr;
  }
}

The output of this example (which is printed at compile time because the trace statements are in the macro, which is executed at compile time) is:

Main.hx:19: ----------> x
Main.hx:20: expr: { expr => EConst(CIdent(x)), pos => #pos(Main.hx:13: characters 8-9) }
Main.hx:21: type: TAbstract(Int,[])
Main.hx:19: ----------> x + 0.5
Main.hx:20: expr: { expr => EBinop(OpAdd,{ expr => EConst(CIdent(x)), pos => #pos(Main.hx:14: characters 8-9) },{ expr => EConst(CFloat(0.5)), pos => #pos(Main.hx:14: characters 10-13) }), pos => #pos(Main.hx:14: characters 8-13) }
Main.hx:21: type: TAbstract(Float,[])
Main.hx:19: ----------> x + ""
Main.hx:20: expr: { expr => EBinop(OpAdd,{ expr => EConst(CIdent(x)), pos => #pos(Main.hx:15: characters 8-9) },{ expr => EConst(CString()), pos => #pos(Main.hx:15: characters 10-12) }), pos => #pos(Main.hx:15: characters 8-12) }
Main.hx:21: type: TInst(String,[])

Here we're passing a few different expressions to a macro, and it prints the expression using .toString() (a feature provided by using haxe.macro.ExprTools), the expression enum (without toString, it dumps the full enum contents), and its type. To determine the type we use Context.typeof(expr), which returns Type -- an enum describing the type of the expression. The compiler can determine the type of expressions in the context of where the macro was called from. In that context, it knows that x is an Int. Remember, x has no value at compile time, but Type inference knows that it is assigned an Int. The compiler also knows that x+0.5 results in a Float, and x+"" results in a String.

Alright, let's pass in a function and see what our macro traces out:

Main.hx:20: ----------> function add(a:Int, b:Int) return a + b
Main.hx:21: { expr => EFunction(add,{ args => [{ name => a, type => TPath({ name => Int, pack => [], params => [] }), opt => false, value => null },{ name => b, type => TPath({ name => Int, pack => [], params => [] }), opt => false, value => null }], expr => { expr => EReturn({ expr => EBinop(OpAdd,{ expr => EConst(CIdent(a)), pos => #pos(Main.hx:16: characters 42-43) },{ expr => EConst(CIdent(b)), pos => #pos(Main.hx:16: characters 44-45) }), pos => #pos(Main.hx:16: characters 42-45) }), pos => #pos(Main.hx:16: characters 35-45) }, params => [], ret => null }), pos => #pos(Main.hx:16: characters 8-45) }
Main.hx:23: TFun([{ name => a, t => TAbstract(Int,[]), opt => false },{ name => b, t => TAbstract(Int,[]), opt => false }],TAbstract(Int,[]))

Wow, so again, that's a lot of information for a fairly simple function. But we realize that the entire AST for the function is here -- the parameters, their types, and the function body are all stored in this enum.

So the first question is, how do we get at the data in the enum values? For example, the trace tells us that the expression's type is a TFun enum value with a name and t field, so you might be tempted to try a .name or .t accessor. But trace is peeking inside at the enum value and revealing its inner structure. TFun is actually just an enum value of the Type enum -- and we must use a switch statement to get its parameterized values out. If you're not familiar with Using Enums and Pattern Matching, now's a good time to go read up on those topics. They're critical to macros, because you'll be matching enums a lot.

In our function example above, at the outermost level we see a EFunction. We've also seen EConst and EBinop. These are expression enum values. They are defined in the std/haxe/macro/Expr.hx, and there are helper functions in the ExprTools.hx utilities (which extend the functionality of Expr's via static extension.) You'll also find Type.hx and TypeTools.hx, which contain definitions and tools for type enums, like TAbstract, TDynamic, TInst, etc.

The *Tools.hx files are also a great resource for examples of how to pattern-match AST enums that make up expressions. I keep Expr.hx and ExprTools.hx (as well as Type.hx and TypeTools.hx) open for reference while working with macros. In addition, you'll discover lots of helpful utilities and information just browsing the files in the std/haxe/macro/ directory.

Ok, back to our function example. We can see that it's an EFunction. We look in Expr.hx at the EFunction definition, and we see:

enum ExprDef {
            ...
	EFunction( name : Null<String>, f : Function );
            ...
}

Ok, EFunction is an enum value of the ExprDef enum, and it's parameterized with a name:String, and a f:Function. We can also find the definition of the Function typedef:

typedef Function = {
	var args : Array<FunctionArg>;
	var ret : Null<ComplexType>;
	var expr : Null<Expr>;
	@:optional var params : Array<TypeParamDecl>;
}

And over in ExprTools.hx, we see an example case statement that grabs the name and function and arguments of an EFunction. (Note that Function is a typedef, not an enum, so we can access the args field directly with the dot operator.) So let's use that example to access and trace the function arguments:

import haxe.macro.Expr;

class Main {
  public static function main() {
    inspect( function(a:Int, B:Int) return a+b );
  }

  static macro function inspect(myFunc:Expr) {
    switch (myFunc.expr) {
      case EFunction(name, func):
        for (arg in func.args) {
          $type(arg);
          trace(arg);
        }
      default:
        // the expression isn't an EFunction -- throw or ignore?
    }
    return macro {};
  }
}

The output here verifies that, indeed, the args are of type FunctionArg, and in this case they contain Int types:

Main.hx:12: { meta => [], name => a, type => TPath({ name => Int, pack => [], params => [] }), ??? => #pos(Main.hx:5: characters 22-23), opt => false, value => null }
Main.hx:12: { meta => [], name => B, type => TPath({ name => Int, pack => [], params => [] }), ??? => #pos(Main.hx:5: characters 29-30), opt => false, value => null }
Main.hx:12: characters 16-19 : Warning : haxe.macro.FunctionArg

Note that, when using pattern matching you must cover all the cases, so you will need to decide what to do if your intended pattern(s) didn't match -- is it ok to ignore it (e.g. if you're just searching for some possible data), or was a match critical to your macro's desired functionality.

Pattern Matching With Reification

If you understand basic pattern matching, then pattern matching with reification (aka, the macro keyword) will hopefully make things a little bit less tedious. As shown here, you can match your expression against an arbitrary expression using reification (aka, the macro keyword). The following pattern matches if the input expression is x + y:

    switch (expr) {
      case (macro x+y):
        trace("Found it!");
      default:
    }

But not only that, you can match some of the expression literally, and some of the expression as a variable, so you can match x + anything:

      case (macro x+$addend):
        trace("Found x + "+addend); // addend is an Expr, of course

This is very helpful when scanning code for certain expressions to transform. And you can match Types and other AST enums this way, too.

So those are a few basics: refer to the enum definitions when you need them, and use pattern matching to extract the information you want. It can be tedious, and an IDE with code completion and peek definition support (like Visual Studio Code with a Haxe extension) is definitely helpful, but simple trace and $type calls are also very useful.

The best way forward from here is to keep looking at examples, try to understand what others are doing, and what tools and patterns they're using. Here are some gems that I've seen:

Inspiring Macro Examples

Now that you have a better idea of how macros work, you'll be better able to synthesize interesting examples. You'll find people doing really cool things with macros -- from creating shorthand syntax to extending the language to clever pattern matching. For example:

I first noticed pattern matching with reification in this awesome CFor example, where user DPeek is deftly matching a custom metatag with three arguments and a block, in effect creating a C-style for loop syntax (though, in this case, without support for continue and break.)

Jason O'Neil has a slick macro called CleverSort which yields a very nice syntax for invoking array sorting. It's a great example of a syntax transformation that provides users with simpler, cleaner code with less typing than is typically required.

Speaking of less typing, I wrote a macro called Lazy Props that turns constructor parameters into variable declarations and automatically assigns them. And a nice example in the Haxe macro cookbook takes the opposite approach and creates a constructor from a class's variable declarations.

Dan K / @nadako has an example using expression enums to parse JSON and provide position-aware error messages. That's putting those position fields on expressions should be put to good use! Plus he's showing us that we can apply these concepts in a runtime program.

While not strictly a macro topic, Juraj's WWX2016 talk mentioned his tink_sql library, which is an excellent application of Haxe's powerful and extensible typing features.

And I've said it before, but you absolutely must check out the Haxe cookbook macros section.

Non-Static Macro Functions

Since most macro examples are static functions, I'll take a moment to point out that macros can be non-static member functions of classes.

Macros have something like an "automatic static extension" mechanism (which doesn't require the using statement, nice!) that allows you to create both static macro functions and regular member macro functions. It seems odd at first, since in the macro context there are no instances, to have member functions. But instead of the runtime implications, consider the object-oriented design implications: a class with member functionality can be instanced, it can be extended, and the fact that a member function is implemented as a macro should have no bearing on these OO principles. Indeed, this is what non-static macro functions provide. But they have a syntax consideration, much akin to static extensions: the first argument is an expression representing the instance. Here's a quick example:

Main.hx

package;

import Macro;

class Main
{
  public static function main()
  {
    var a = new Foo();
    var b = new Bar();

    a.member_func(4);
    b.member_func(5);
  }
}

Macro.hx

package;

import haxe.macro.Context;
import haxe.macro.Expr;

using haxe.macro.ExprTools;

class Foo {
  public function new() { }
  public macro function member_func(inst_expr:Expr, val:Expr)
  {
    trace("----------> "+inst_expr.toString());
    trace("inst_expr: "+inst_expr);
    trace("typeof inst_expr: "+Context.typeof(inst_expr));
    trace("local class: "+Context.getLocalClass());
    return macro trace("Hello!");
  }
}

class Bar extends Foo {
  public function new() { super(); }
}

When compiled, this program prints:

Macro.hx:12: ----------> @:this this
Macro.hx:13: inst_expr: { expr => EMeta({ name => :this, params => [], pos => #pos(Main.hx:12: characters 4-5) },{ expr => EConst(CIdent(this)), pos => #pos(Main.hx:12: characters 4-5) }), pos => #pos(Main.hx:12: characters 4-5) }
Macro.hx:14: typeof inst_expr: TInst(Foo,[])
Macro.hx:15: local class: Main
Macro.hx:12: ----------> @:this this
Macro.hx:13: inst_expr: { expr => EMeta({ name => :this, params => [], pos => #pos(Main.hx:13: characters 4-5) },{ expr => EConst(CIdent(this)), pos => #pos(Main.hx:13: characters 4-5) }), pos => #pos(Main.hx:13: characters 4-5) }
Macro.hx:14: typeof inst_expr: TInst(Bar,[])
Macro.hx:15: local class: Main

Notice that the first expression parameter received is @:this this, and its type is the type of the instance on which the function was called, either Foo or Bar. (Not to be confused with Context.getLocalClass, which is Main, since that's the class from where the macro was called and where the resulting expressions will be injected.) The other parameters come after the instance, just like with static extension.

Note that Foo and Bar had to be defined in a separate file from Main, otherwise you get an error: Cannot access static field member_func from a class instance. While this doesn't seem to make sense at first (member_func isn't static!), if you consider that the non-static macro pattern seems to follow the static extension paradigm, a using statement wouldn't work when targetting classes in the same file / module.

But unlike simple static extensions, classes that extend Foo also inherit the non-static macro functions. This is excellent, as it allows us to design seemingly regular classes with functions that happen to be macro implementations.

Macro Implementation Details

So far we've covered generating expressions and inspecting them, but it's still important to know where and how to implement macros, and avoid pitfalls along the way.

First, it's important to know that there are actually three distinct kinds of macros: expression macros, initialization macros, and build macros. Above we largely focused on expression macros, but I encourage you to read the section in the Haxe manual describing these -- in fact, read the whole section on Macros.

As you read the section on macro context, consider when and where your macros are run -- how build order may affect when they're invoked, what definitions are available, and how the calling context affects what information they have access to. You may have noticed #if macro pragmas laying around the examples. This is because there's a separate context in the compiler for macros to run in. The pragmas ensure that the contained code only gets run (or not) in the macro context, and avoids compile errors or build order issues. It can be helpful, especially when creating build macros or non-static macro functions to put the macro code in a separate file from the normal code.

Finally, there are some harsh realities to macros. Perhaps foremost, macro processing can inflate your build duration. You can invoke the haxe compiler with --times and it will report the time spent on various tasks, including macro execution. I'm secretly hoping the new HL target will help speed macros up, but it could be a vain hope, however.

Conclusion

Phew, that's enough for now. Again, the topic of macros is so broad, I feel like we've only scratched the surface. But hopefully this enough of a spark to ignite your curiosity for one of the most powerful features of the Haxe language. What will you do with Haxe macros?

Drop me a line in the comments or on Twitter @jeff__ward, or if you're so inclined, feel free to support my efforts to spread Haxe love around via patreon. Cheers!

comments powered by Disqus