maandag 12 oktober 2015

Rumors of JITs' demise are greatly exaggerated.

Earlier this week my attention was brought to an article claiming that the dusk was setting for JIT compilation. Naturally, I disagree. I usually try to steer clear of internet arguments, but this time I think I may have something to contribute. Nota bene, this is not a perl- or perl6 related argument, so if that is strictly your interest this is probably not an interesting post for you.

The main premise of the argument is that people are shifting away from JIT compilation because the technique has failed to live up to its promises. Those promises include, in various forms, high level languages running 'as fast as C', or having more optimization possibilities than ahead-of-time (AOT) compilers do. Now my perspective may be a bit unusual in that I don't actually expect momentous gains from JIT compilation per se. As I've described in the talk I gave at this years' YAPC::EU, by itself JIT compilation removes only the decoding and dispatch steps of interpretation, and - depending on the VM architecture - these may be a larger or smaller proportion of your running time. However, my thesis is that interpretation is not why high-level languages are slow, or rather, that interpretation is only one of the many sources of indirection that make high-level languages slow.

First of all, what of the evidence that JITs are actually in demise? The author provides three recent trends as evidence, none of which I hold to be decisive. First, both Windows 10 and the newest versions of Android translate .NET and Dalvik applications respectively to native code at installation time, which is properly considered ahead of time compilation. Second, high-performance javascript applications are currently often created using tools like emscripten, which compiles to asm.js, and this is in many ways more similar object code than it is to a high-level language, implying that the difficult bit of compilation is already behind us. (I agree mostly with that assesment, but not with its conclusion). Finally, on iOS devices JIT compilation is unsupported (except for the JIT compiler in the webkit browser engine), allegedly because it is insecure.

As to the first piece, the author suggest that the main reason is that JIT compilers being unpredictable in their output, at least relative to optimizing ahead-of-time compilers. I think that is nonsense; JIT compilation patterns tend to be quite reliably the same on different runs of the same program, a property I rely on heavily during e.g. debugging. The output code is also pretty much invariant, with an exception being the actual values of embedded pointers. So in my experience, what you see (as a developer) is also what you get (as a user), provided you're using the same VM. I humbly suggest that the author believes JITs to be unreliable because his work is being compiled by many different VMs using many different strategies. But I see that no differently than any other form of platform diversity. Maybe the author also refers to the fact that often optimization effectiveness and the resultant performance of JIT compiled applications is sensitive to minor and innocuous changes in the application source code. But this is true of any high-level language that relies primarily on optimizing compilers, for C as much as for python or javascript. The main difference between C and python is that any line of C implies far fewer levels of indirection and abstraction than a similar line of python.

I think I have a much simpler explanation as to why both Google and Microsoft decided to implement ahead-of-time compilation for their client platforms. The word 'client' is key here; because I think we're mostly talking about laptops, smartphones and tablets. As it turns out, hardware designers and consumers alike have decided to spend the last few years worth of chip manufacturing improvements on smaller, prettier form factors (and hopefully longer battery life) rather than computing power. Furthermore, what Qualcomm, Samsung etc. have given us, Photoshop has taken away. The result is that current generation portable devices are more portable and more powerful (and cheaper) than ever but are still memory-constrained.

JIT compilation inevitably comes with a significant memory cost from the compiled code itself (which is generally considerably larger than the interpreted code was), even when neglecting the memory usage of the compiler. Using various clever strategies one can improve on that a bit, and well-considered VM design is very important as always. But altogether it probably doesn't make a lot of sense to spend precious memory for JIT-compiled routines in a mobile setting. This is even more true when the JIT compiler in question, like Dalviks', isn't really very good and the AOT compiler has a good chance of matching its output.

Now to the case of asm.js. As I said, i agree mostly that a significant amount of work has already been done by an ahead-of-time compiler before the browser ever sees the code. It would be a mistake to think that therefore the role of the JIT (or rather the whole system) can be neglected. First of all, JIT-compiled code, even asm.js code, is greatly constrained in comparison to native code, which brings some obvious security benefits. Second of all, it is ultimately the JIT compiler that allows this code to run cross-platform at high performance. I think it is mistaken to suggest that this role is trivial, and so I see asm.js as a success of rather than evidence against JIT compilation as a technique.

Next, the iOS restriction on JIT compilation. I think the idea that this would be for security reasons is only plausible if you accept the idea that application security is significantly threatened by dynamic generation of machine code. While I'm sure that the presence of a JIT compiler makes static analysis very difficult - not to say impossible - I don't believe that this is the primary attack vector of our times. The assertion that memory must be both writable and executable for a JIT compiler to work is only superficially true, since there is no requirement that the memory must be both at the same time, and so this doesn't imply much of a threat (So called W^X memory is becoming a standard feature of operating systems). Vtable pointers stored in the heap, and return addresses on a downward-growing stack, now those are attack vectors of note.

But more importantly, that is not how mobile users are being attacked. It is much more interesting, not to mention significantly easier, for attackers to acquire whole contact books, private location information, credentials and private conversations via phishing and other techniques than it is to corrupt a JIT compiler and possibly, hopefully, and generally unreliably gain remote execution. Most of these attack vectors are wide-open indeed and should be prevented by actual security techniques like access control rather than by outlawing entire branches of computing technology. Indeed, an observer not sympathetic to Apple could probably relate this no-JIT compilation rule with the Californian company's general attitude to competing platforms, but I will not go further down that path here.

Finally, I think the claim that JIT compilation can't live up to its promise can readily be disproven by a simple google search. The reason is simple; the JIT compiler, which runs at runtime, has much more information at its disposal than even the best of ahead-of-time compilers. So-called profile-guided optimization help to offset the difference, but it is not a common technique, moreover that is still only a small subset of information available to a JIT compiler. The fact that many systems do not match this level of performance (and MoarVM's JIT compiler certainly doesn't) is of course relevant but not, in my opinion, decisive.

In conclusion, I would agree with the author that there are many cases in which JIT compilation is not suitable and in AOT compilation is. However, I think the much stronger claim that the dusk is setting on JIT compilation is unwarranted, and that JIT compilers will remain a very important component of computing systems.