Register vs. stack based VMs
Traditionally the virtual machine architecture of choice has been the stack machine; benefits include simplicity of VM implementation, ease of writing a compiler back-end (most VMs are originally designed to host a single language) and code density (i.e., executables for stack architectures are invariably smaller than executables for register architectures).
For a stack architecture to be an effective solution, two conditions need to be met:
- The generated code has to ensure that the top of stack is kept in sync with where the next instruction expects it to be. For instance, on its return a function cannot leave stuff lying around on the stack like it can leave values in registers (whose contents can simply be overwritten).
- Instruction execution needs to be generally free of state, so an add-two-integers instruction should not have to consult some state variable to find out the size of integers being added. When the value of such state variables have to be saved and restored around function calls, they effectively become VM registers.
Cobol is one language where it makes more sense to use a register based VM. I wrote one and designed two machine code generators for the MicroFocus Cobol VM and always find it difficult to explain to people what a very different kind of beast it is compared to the VMs usually encountered.
Parrot, the VM designed as the target for compiled PERL, is register based. A choice driven, I suspect, by the difficulty of ensuring a consistent top-of-stack and perhaps the dynamic typing of the language.
On register based cpus with 64k of storage, the code density benefits of a stack based VM are usually sufficient to cancel out the storage overhead of the VM interpreter and support a more feature rich application (provided speed of execution is not crucial).
If storage capacity is not a significant issue and a VM has to be used, what are the runtime performance differences between a register and stack based VM? Answering this question requires compiling and executing the same set of applications for the two kinds of VM. Something that until 2001 nobody had done, or at least not published the results.
A comparison of the Java (stack based) VM with a register VM (The Case for Virtual Register Machines) found that while the stack based code was more compact, fewer instructions needed to be executed on the register based VM.
Most VM instructions are very simple and take relatively little time to execute. When hosted on a pipelined processor the main execution time overhead of a VM is the instruction dispatch (Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters) and reducing the number of VM instructions executed, even if they are larger and more complicated, can produce a worthwhile performance improvement.
Google has chosen a register based VM for its Android platform. While licensing issues may have been a consideration, there are a number of technical advantages to this decision:
- A register VM is likely to have an intrinsic performance advantage over a stack VM when hosted on a pipelined processor.
- Byte code verification is likely to be faster on a register VM (i.e., faster startup times) because stack height integrity checks will be greatly simplified.
- A register VM will be more forgiving of incorrect code (in the VM, generated by the compiler, code corrupted during program transmission or storage attacked by malware) than a stack VM.
This is a very interesting post – thanks.
I see your thoughts about register based vms but have some other considerations which you don’t seem to have addressed. In the case of the JVM the stack based vm is Just In Time compiling the byte code as methods connect to the execution graph. This means that the performance of the byte code is less important than its ability to describe the intention of the code in such a way that the optimizer can take advantage.
In a similar but different way, the Micro Focus vm produces intermediate code. Why is this called intermediate code, because the NCG (native code generator) is usually used to compile this down to bare metal instructions.
Declare interest – I work for MF!
In recent comparisons, we have been finding that COBOL compiled to JVM byte codes and then run using the JITing JVM compares well with COBOL compiled to int code and then put through the optimizing NCG to give native code.
What does this mean? I think it indicates that the benefits you are seeing with register based VMs, whilst real, are largely irrelevant when the VM code is compiled down to bare metal before execution.
Best wishes – AJ
Good point Alex, JIT compilation is an important issue. Which architecture is best from the JIT perspective? I suspect that the same amount of information about the code can be extracted from both architectures and given sufficient time the generated code quality will be essentially the same.
The purpose of JIT code generation is to improve the performance of a currently executing program, so it cannot hang around looking for fancy optimizations that sometimes produce savings. Given a fixed amount of processing time would faster machine code be generated from a register or stack VM? Effectiveness of Cross-Platform Optimizations
for a Java Just-In-Time Compiler provides time/quality numbers for a stack VM, I don’t know of any similar figures for a register based VM.
Of course if the architecture of a register VM is close to that of the target hardware the JIT generator will have some of its work done for ‘free’, eg, register allocation. Google’s Davik VM is a long way from looking like an Intel x86, an Arm processor would be a much better fit.
That is very interesting information on compiled Cobol performance (MicroFocus used to call the NCG’s ‘Optimizers’, because they optimized performance). I last worked on one of these ‘optimizers’ over 15 years ago, and things might have changed a bit (perhaps bits of the Sparc generator are still in use :-O), and what was considered important back then was generating good code for data movement involving various combinations of source/destination storage alignment + fancy algorithms for doing arithmetic on values represented various non-two’s compliment forms. I recall that some analysis we did showed that only 5% of code had to be handled by calls to library functions.
I am surprised to hear that the performance of code generated by going via the JVM is comparable to that produced by MicroFocus’s NCG. Perhaps you are running on a platform where misaligned storage accesses don’t incur a large penalty or perhaps there is a high percentage of library calls?
What is often not considered when comparing stack and register based VMs is that SSA form (a corner stone of any reasonable optimizing JIT compiler) falls out for free in a stack based VM. In particular anything on the stack at the end of a basic block needs to have a phi node. Calling Java VMs stack based is also confusing as the local variables could be thought of as registers, and are in simple JITs often mapped to fixed registers.
I think the biggest complaint about register based VMs is going to be that they make assumptions of some underlying architecture that they are targeting. Given this and the extra work they must do to construct an SSA form, I think they need to work harder to convince the world that they are better than stack based VMs (the paper cited above included).