zaterdag 14 juni 2014

Progress

Those of you who have followed #moarvm or github closely may already know, but this week I've finally checked in code that calculates 2 + 2 = 4 and returns that value to its' caller. To be very specific, I can make a frame that does the following operations:


const_i64_16 r0, 2
const_i64_16 r1, 2
add_i, r2, r1, r0
return_i r2

As a proof of concept, this is a breakthrough, and it shows that the strategy we've chosen can pay off. I didn't quite succeed without help of FROGGS, jnthn, nwc10, timotimo and others, but we're finally there. I hope. (I'll have to see about windows x64 support). The next thing to do is cleanup and extension. Some objectives for the following week are:
  • Cleanup. The JIT compiler still dumps stuff to stderr for my debugging purposes, but we shouldn't really have that. I've tried moving ad.all output to the spesh log but I can hardly find the data in there, so I think I'll make a separate JIT log file instead. Similarly, the file for the JIT compiler's machine code dump - if any - should be specified. And I should add padding to the dump, so that more than one block can be dumped.
  • Adding operations to compile. MoarVM supports no fewer than 638 opcodes, and I support 4 yet. That is about 0,62% of all opcodes :-). Obviously in those terms, I have a long way to go. jnthn suggested that the specialized sp_getarg opcodes are a good way to progress, and I agree - they'll allow us to pass actual arguments to a compiled routine.
  • Translate the spesh graph out of SSA form into the linear form that we use for the JIT 'graph' (which is really a labeled linked list so far).
  • Compile more basic blocks and add support for branching. This is probably the trickiest thing of the bunch.
  • Fix myself a proper windows-x64 virtual machine, and do the windows testing myself.
  • Bring the moar-jit branch up-to-date with moarvm master, so that testers don't have such a hard time.
 As for longer-term goals,we've had some constructive contact with Mike Pall (of LuaJit / DynASM fame), and he suggested ways to extends DynASM to support dynamic registers. As I've tried to explain last week, this is important for 'good' instruction selection. On further reflection, it will probably do just fine to introduce expression trees - and the specialized compiler backend for them, which would need register selection - gradually, i.e. per supported instruction rather than all at once.

However, the following features are more important still:
  • Support for deoptimisations. Up until now (and the foreseeable future) we keep the memory layout exactly the same
  • JIT-to-interpreter calls. This is a bit tricky - MoarVM doesn't support nesting interpreters. What we'll have to do instead is return to the interpreter with a label that stores our continuation, and continue at that continuation when we return.
  • At some point, JIT-to-JIT calls. Much the same problems apply - in theory, this doesn't have to differ from JIT-to-interpreter calls, although obviously we'd rather optimise the interpreter out of this loop.
  • Support for exceptions, obviously, which - I hope - won't be as tricky as it seems, as it ultimately depends on jumping in the bytecode at the right place.
  • Support for simple optimisations, such as merging various MoarVM opcodes into a single opcode if that is more suitable.
So that is it for now. See you next week!

    maandag 9 juni 2014

    News

    Today is the day I've both created an implementation of the 'JIT graph' and destroyed it. (Or rather stashed it away in a safe branch, but you get the point). The current HEAD of moar-jit has nothing that should deserve a name like 'JIT graph'. it is merely a thin layer around MVMSpeshGraph. So I thought maybe I should explain why I did this, what the consequences are, and what I'll do next.

    First of all, let me explain why we wanted a 'JIT graph' in the first place, and what I think it ought to be. MoarVM contains a bytecode specialization framework called spesh. My current project to write a JIT compiler can be seen as an extension of this framework. Also, the core data structure of spesh - namely, MVMSpeshGraph - is also the input to the JIT compiler. I've promised a thorough walkthrough of spesh and you'll get it, but not today, today I have another point to make. That point is that although the spesh graph applies some sophisticated transformations upon the source bytecode, it is in essence still MoarVM bytecode. It still refers to MoarVM instructions and MoarVM registers.

    Now that is perfectly alright if you want to eventually emit MoarVM instructions as it has done up until now. However there is still quite a layer of abstraction between MoarVM and the physical processor that runs your instructions. For example, in MoarVM acquiring the value of a lexical is a simple as a single getlex instruction. For the CPU there are several levels of indirection involved to do the same, and quite possibly a loop. The goal of the 'JIT graph' then was to bridge these levels of abstraction. In effect, it is to make the job of the (native) code generator much simpler.

    I think the best way to explain this is with an example. Given the following MoarVM instruction:
    
    add_i r0, r1, r2
    

    I'd like to construct the following tree:
    
    store --> address --> moar-register(r0)
          \-> value --> add --> load --> moar-register(r1)
                            \-> load --> moar-register(r2)
    

    I think we can all criticize this structure for being verbose, and you'd be correct, but there is a point here. This structure is suitable for tree-matching and rewriting during code generation - in short, for generating good code. (Simpler algorithms that emit lousy code work too :-)). There are too many nice things I have to say about this structure. But it depends critically on my capability to select the registers on which operations take place. And as it turns out, on x86_64, I can't. Or on any other architecture than x86. Oh, and LuaJit doesn't actually use DynASM to compile its JIT, what do you know.

    Actually, I kind-of could've guessed that from the luajit source. But I didn't, and that is my own dumb fault.

    So, what to do next? There are two - or three, or four - options, depending on your level of investment in the given tools. One such option is to forgo register selection altogether and use static register allocation, which is what I did next. If we do that, there is truly no point in having a complicated graph, because all information is already contained in the MoarVM instructions themselves, and because you can't do anything sensible between instructions. After all, static register allocation means they're always the same. In essence, it means translating  the interpreter into assembly.  For most instructions, this approach is trivial - it could be done by a script. It is also rather unambitious and will never lead to much better performance than what the interpreter can do. Maybe 2x, but not 10x, which is what I think should be doable.

    The other option is to do register selection anyway, on top of DynASM, just because. I'm... not sure this is a great idea, but it isn't a terrible idea, either. In essence, it involves writing or generating giant nested switch structures that emit the right code to DynASM, like so, but everywhere, for every instruction in which you'd want this. I don't think that is particularly tractable, but it would be for a preprocessor.

    The third option is to fix DynASM to do dynamic register allocation on x86_64 and any architecture you need it. This is possible - we maintain a fork of DynASM - but it'd involve deep diving into the internals of DynASM. What is more, Mike Pall who is vastly more capable than I am decided not to do it, and I'm fairly sure he had his reasons. The fourth option is to look for another solution than what DynASM provides. For while it is certainly elegant and nice, it may not be what we ultimately want.


    woensdag 4 juni 2014

    Goals and subgoals

    Hi everybody! As it seems that a JIT compiler doesn't fall into place fully formed in a weekend, I've decided to set myself a few goals - along with smaller subgoals that I hope will help keep me on track. The immediate goal for the week is to compile a subroutine that adds two numbers and returns them, like so:
    
    sub foo() {
        return 3 + 4;
    }

    Which is literally as basic as you can get it. Nevertheless, quite a few parts have to be up and moving to get this to work. Hence the list. So without further ado, I present you:

    • Modifying the Configure / Make files to run DynASM and link the resulting file.
    I've actually already done this, and it was more complicated than it seems, and I'm still not completely happy about it.
    • Obtaining writable memory that can be marked executable
    • Marking said memory executable and non-writable (security folks!)
    I plan to do this by hijacking MVM_platform_allocate_pages(), which nobody uses right now.
    • Determine, for a given code graph, whether we can JIT compile it. 
      • Called MVM_can_jit_graph(MVMSpeshGraph*)
    • Transforming a Spesh graph into a JIT graph
      • Note that I don't know yet what that JIT graph will look like.
      • I think it will hold values along with their sizes, though. I'm not sure the spesh graph does that. 
    • Directly construct our very simple code graph, by hand, using MAST.
    • JIT compiling the very simple code graph of our code.
    • UPDATE: attach a JIT code segment to a MVMStaticFrame
    • Calling and returning from that code.
    This... will probably be a bit experimental - it's of no use to throw in a full-fledged register allocation and instruction selection algorithm to add two constant numbers. We can - in principle - also do without these, but it will lead to rather poor machine code. 

    I've probably forgotten quite a few things in here. But this seems like a start. If there's something you think I missed, please comment :-)