zaterdag 4 juni 2016

In praise of incremental change

Hi everybody, welcome back. I'd like to share with you the things that have been happening in the MoarVM JIT since I've last posted, which was in fact March. To be brief, the JIT has been adapted to deal with moving frames, and I've started to rewrite the register allocator towards a much better design, if I do say so myself.

First, moving frames. jnthn has already written quite a bit about them. The purpose of it is to make the regular case of subroutine invocation cheaper by making special cases - like closures and continuations - a bit more expensive. I have nothing to add to that except that this was a bit of a problem for the JIT compiler. The JIT compiler liked to keep the current frame in a non-volatile register. These are CPU registers which are supposed not to be overwritten when you call a function. This is useful because it speeds up access of frequently used values. However, if the current frame object is be moved, the frame pointer in this register becomes stale. Thus, it had to go, and now we load the frame pointer from the thread context object (i.e. in memory), which never moves.

Unfortunately, that was not sufficient. Because MoarVM is an interpreter, control flow (like returning from a routine) is implemented updating the pointer to the next instruction (and the next frame). JIT compiled code never deals with this instruction pointer. Hence, whenever this instruction pointer could have been updated - we call this invokish or throwish code - the JIT may need to return control to the interpreter, which can then figure out what to do next. Originally, this had been implemented by comparing the frame pointer of the JIT frame - stored in the non-volatile register - with the frame pointer as understood by the interpreter - i.e., in the thread context object. This check no longer worked, because a): we didn't have a permanent pointer to the current frame anymore, and b): the current frame pointer might change for two different reasons, namely control flow and object movement.

I figured out a solution to this issue when I realized that what we really needed is a way to identify (cheaply) in the JIT whether or not we have changed control flow, i.e. whether we have entered another routine or returned out of the current one. This might be achieved by comparing immutable locations, but lacking those, another method is to simply assign increasing numbers to constructed frames. Such a sequence number then identifies the current position in the control flow, and whenever it is changed the JIT knows to return control to the interpreter. This caused some issues at first when I hadn't correctly updated the code in all places where the interpreter changed the current instruction, but afterwards it worked correctly. Special thanks go to lizmat who allowed me to debug this on Mac OS X, where it was broken.

Afterwards, I've focused on improving the register allocator. The primary function of a register allocator is to ensure that the values used in a calculations are placed in (correct) registers prior to that calculation. This involves, among other things, assigning the correct registers (some operations only work on specific registers), spilling registers to memory in order to make place, loading spilled values from memory if necessary, and ensuring that values in volatile registers are spilled prior to a function call. This was rather difficult because in the old design because, as it was inlined into the compilation step, it  couldn't really look behind or ahead, which is a problem if you want to place something correctly for future use. Furthermore, it allowed only for a one-on-one correspondence between a value that was computed and its current location in a register. That is a problem whenever -a value is copied to a different register, or stored in multiple memory locations.

So I've been, very slowly and methodically, in very small steps, moving code and structures through the JIT compiler in order to arrive at a register allocator that can handle these things. The first thing I did was remove the register allocation step out of compilation, into its own step (commit and another commit). Then I extracted the value descriptor structure - which describes in which location a value is stored - out of the expression tree (commit). I stored the tile list in a vector, in order to allow reverse and random access (commit). Because the register allocator works in a single pass and only requires temporary structures, I've 'internalized' it to its own file (commit one and commit two). Finally, I replaced the per-expression value structure with value descriptor structures (commit).

This places me in a position to replace register allocator structures (such as the active stack with an expiry heap), implement loads and stores, record register requirements per tile, implement pre-coloring, and correct allocation over basic blocks. All these things were impossible, or at least very difficult, with the old design.

What I think is interesting here is that in each of these commits, the logic of the program didn't substantially change, and the JIT continued to just as well as it had before. Nevertheless, all of this is progress - I replaced a rather broken design assumption (online register allocation with a value state machine) with another (offline register allocation with value descriptors) - that allows me to implement the necessary mechanics in a straightforward manner. And that, I think, demonstrates the power of incremental changes.