Not logged in, Join Here! or Log In Below:  
News Articles Search    

 Home / General Programming / My VM Stack Account Manager
Archive Notice: This thread is old and no longer active. It is here for reference purposes. This thread was created on an older version of the flipcode forums, before the site closed in 2005. Please keep that in mind as you view this thread, as many of the topics and opinions may be outdated.

May 02, 2005, 05:37 AM

I'm trying to decide how to implement my VM stack. At the moment my VM only allows for declarations of simple types, so I just had a simple array of stack_item which was a union of those simple types. This made coding pretty easy, although it may be a bit wasteful because of double datatypes.

However, it was pointed out to me that I should make local (non-simple) objects also on the stack. Should I forego the current method and implement it as a flat char* or equivalent? Or should I use map an int32_t* onto the stack so that all my reads/writes are 32-bit aligned? Is 64-bit alignment becoming important?


May 02, 2005, 05:50 AM

Betelgeuse wrote: Is 64-bit alignment becoming important?

I suppose you target x86 (or x86-64), then 64bit alignement - data wise - is only relevant for doubles (and then only for performance reasons); and if you plan to support packed things then it's 128bit and not just for decoration.


May 02, 2005, 06:36 AM

I don't quite understand in what context you mean packed things. Do you mean SIMD?

Anyway, it seems the answer is to stick with 32bit for now. I can always fudge in 128bit alignment when it is needed.


May 02, 2005, 07:39 PM

Yes, packed as in Packed Scalars. There's 2 sides to x86 SIMD, scalars & packed scalars.
While scalars just ask for 32bit alignement, packed scalars mandate 128bit.
You better pay attention to them as the fpu is deprecated in x86-64 ABIs so everything is moved around in those SIMD registers.


May 03, 2005, 04:55 AM

Let me preface this by saying that I'm by no means an expert on compiler design, so I can't vouch for how optimal what I'm describing is. To me it seems like the best trade-off between efficiency and ease of implementation. Also, I'm assuming that the VM is running the output of a compiler of some sort (if I'm wrong and by 'VM' you mean the second pass of a two-pass interpreter, by all means make your expression stack take every conceivable user-defined type).

Assuming this is a compiler, these types of decisions seem to depend heavily on how the VM is used, and what it's intended to accomplish. It sounds like your VM is type-aware: in contrast, mine is entirely symbol-agnostic, so the only types it's aware of are the native register types (it's still interpreting bytecode, but I wanted to leave open the possibility for a relatively painless transition to JIT in the future). So, one thing I would do to start off, is try to assess whether or not the VM needs to have any knowledge of language-specific types.

In my case, the language is compiled down to pseudo-assembly (that looks suspiciously like 80x86), which treats user-defined types in a manner that's similar to what a native compiler would do. By the time the assembler gets hold of the output, all it sees are operations on register types and decorated fixup names for labels/addresses. By the time the VM gets hold of the result, it sees no symbols at all.

There are pros and cons to this. The pros are that it's very easy to do operator overloading, copy constructors, etc, without a whole lot of special code. In general, it seems like by hiding type-specific information from the part that actually runs the code, you can encapsulate functionality on those types in very small areas, and mostly in the expression accumulator and expression stack. The cons are that this results in a larger number of instructions for the same program, so the VM jump table gets bigger (although probably not by as much it sounds, plus the smaller instruction granularity might allow more room for optimization on the compiler side). Another problem with this approach is that you get tied to a type of virtual machine or processor. Modern C++ compilers seem to use intermediate, almost higher-level languages, partially in order to abstract the architecture from the compiler. With what I'm doing, since my VM is essentially a stack machine (two accumulators, a stack register and a limited use stack frame register), and I don't have an intermediate language, it would be fairly painful to try and suddenly make the compiler generate code for a processor-rich architecture with a completely different instruction set (and usage).

In practice, with a type-agnostic VM, the usage of the stack for local variables becomes fairly similar to a native implementation. First, all registers are saved. Second, a register (call it ebp) is used to store the stack pointer. The stack pointer is then incremented by the total amount of local storage, including all local variables (note that the function parameters have already been pushed on the stack by the caller). This is intended to leave room for the locals; since locals don't have to be declared at the top of the function body, this makes single-pass compilation mildly tricky (requiring a pointer fixup for actual value of sp, since you can't infer how much to increment esp by until you compile the entire function). Locals are then accessed relatively to ebp (eg., [ebp+10] for a local, or [ebp-8] for a parameter -- these values are known, since variables are declared by the time you use them).

This is how that works with user-defined types, from the compiler's standpoint. I have a class called Value that is used as the expression stack element, with a derived Accumulator class that overloads bits of the interface. Values hold various things like Variable pointers if they're a variable, Type class instances, etc, as well as their opcode text (eg, "[ebp+10]"). User-defined types construct large portions of the Value members, and define the opcode text while they're at it. For a class or a struct, this means that all you have to do to make every member of a built-in type (int, float, etc) just work, is to handle the member selection operator (I'm handling it as a sort of a weird special-cased postfix unary operator, same with array indices). For user-defined members of a user-defined aggregate type, you just repeat the process until you arrive at a built-in type or an array of built-in types. Normally, with a simple variable in a stack machine, you'd move it to the accumulator, and then either push the accumulator or perform an operation, depending on the context. The only difference with user-defined types is that the member selection operator offsets the address first.

Quick note -- you _do_ have to store locals on the stack. Putting them on the heap makes them act as static variables, essentially, so a recursive function writing to a local variable will write to a shared copy of it, which is usually not the desired outcome. This also means that the aggregate types have to set their members' opcode text every time they're instantiated (since that's dynamic).

Anyway, to sum up:

Compiler-side: _expresssion_ stack elements and _expression_ accumulator (no direct mapping to the VM stack and VM accumulator) are aware of their type and what they are as an opcode operand (if they're a simple type). The type class allocates data for values. Operations, implicit casts and explicit casts are implemented as command objects, indexed by type (or type pair or type triple), and operate on dumb data inside the value instance. Variables are doubly linked with value objects. Aggregate variables contain an offset table with members as (simple or aggregate) variable objects.

Assembler-side: expects symbolic names for labels and some of the offsets, infers label addresses and replaces those, as well as symbolic offsets, with numeric values.

VM-side: register types only. The stack takes 32-bit values (or 64 or whatever you're targeting), and the registers are set up accordingly.

Hope some of this helps.



May 04, 2005, 12:03 PM

Thanks, golt... your discussions are always so helpful yet seem to induce so many rewrites =)

For now I'm going to make the stack model machine assembly as much as possible (without it getting too complicated) so that the JIT possibility remains in the future. For now I only want to get the thing up and running so that it is useable, but I don't want to screw the pooch and implement a design aspect that will have to be undone later.


May 04, 2005, 07:28 PM

Doh, the username threw me off. I guess we've already discussed most of this then :). I agree about trying to target something that looks like native assembly. One thing that might help, would be to fully finalize the assembler and VM first. Those parts are quite simple in comparison to the compiler, and the design is not difficult to harden. If full instructions have some sort of a consistent structure that's opcode-independent, eg., something like (pardon the half-assed BNF) instruction ::= opcode ( [address_mode, imm32] || [0, reg32] )*, parsing the assembly and interpreting the bytecode becomes pretty trivial.

The first thing I did before starting to convert the interpreter to a compiler, was to write the assembler and VM, finalize the instruction set and instruction format, and write fairly large amounts of raw pseudo-assembly and some unit tests to make sure it all worked. Haven't had to touch either one of them since.

As for the rewrites, I think that's somewhat unavoidable here (or in any large-ish project, for that matter). What's avoidable is ending up in a situation where the refactoring turns into a huge unmanageable undertaking, due to not refactoring continuously in small increments.


Michael Walter

May 04, 2005, 10:16 PM

Quick note -- you _do_ have to store locals on the stack. Putting them on the heap makes them act as static variables, essentially, so a recursive function writing to a local variable will write to a shared copy of it, which is usually not the desired outcome.

That is wrong [1], you can surely allocate activation records on the heap. Whether you want to do so solely depends on the semantics of the language you're implementing (ex.: the semantics of C suggest using the stack, whereas you might want to use the heap for Scheme).

[1] You seem to be thinking of dynamic scope?


May 05, 2005, 12:28 AM

Just as a side-note to the initial question and as was mentioned, if you plan on ever doing say 16-byte aligned (SIMD) structs/classes, you will need more than a simple 'ebp' pointer for locals vs function parameters. Well, assuming you're using a stack entry size less than your maximum alignment requirements.

If not, your local stack pointer will need to be aligned per function. That is, you will have to set ebp = alignN( sp, N ) at the start of a function -- where N is the max alignment of any local/temp for that function. Knowing that ebp will be properly aligned, you can get the placement of locals correct (reserving space for all declared locals and aligned temps once at the top of the function to make this easier).

To get to your function's parameters, you need another 'register' that points to the first one say (basically what ebp was before aligning it). And you'll need a variant(s) of your PUSH opcode to push from either register.

You can probably do some tests w/ VC++ + aligned structs and see the assembly for an example of what I described. You'll also find that it doesn't let you pass aligned structs/classes by value, for good reason, but you can return them. Returning them basically requires generating hidden, temp-locals into that block I mentioned before.

We're use 4-byte stack entries since we don't use doubles or int64s. So our Vec3/4s are what force alignment hell.



May 05, 2005, 03:12 AM

Point taken -- I didn't phrase that very well. First of all, I'm thinking in terms of a C++-like language, and second, by 'putting locals on the heap', I meant essentially allocating/constructing them once (but only after the declaration is executed, unlike a C/C++ static), and not destroying them when the program leaves the scope, so they can be reused.

Now that I've typed that up, it seems like a pretty random assumption, but I hadn't considered placing activation records on the heap because it just doesn't seem to make any sense in a C++ context: you'd be essentially creating a makeshift stack, but with no guarantee of contiguity, a lot of unnecessary heap fragmentation, and no benefits I can think of.

Just curious, why do you say that it would make more sense in Scheme? I'm not familiar enough with it to know how local variables would be allocated, but I thought Scheme was lexically scoped, so functions only need access to their own locals and to their superscope (correct me if I'm wrong). And with a dynamically scoped language (eg Lisp), you'll be doing a lot of late binding anyway, so almost none of my rant really applies.


Michael Walter

May 05, 2005, 08:25 AM

Scheme has call-with-current-continuation, which suggests such an implementation.

If "Lisp" refers to "Common Lisp", all but global variables are lexically scoped .


May 05, 2005, 03:34 PM

Continuations are somewhat related to 'yield' in Python, aren't they? You take a snapshot of the stack frame and hold a pointer to the current instruction (which is what a continuation is, iirc). I sort of ended up reinventing bits of the generator concept in Python when I designed the language I'm working on (didn't know about generators, vaguely knew about yield).

Haven't implemented that bit yet, but in my case, the yield keyword literally stops the VM until it's re-entered (long story), so this is not a problem -- it just means the stack is not reset when the VM stops for the current timeslice. The heap is shared between VMs, the stack is not (for that reason).

Anyway, that's a good point, though. It'd be fairly painful to mix stack-allocated activation records with heap-allocated ones, in a generic yield setting (if you kept a separate stack for generators), and copying them from the stack to the heap in order to take a snapshot would probably defeat the benefits of having a stack to begin with, so just putting all of them on the heap makes sense in that case.



May 05, 2005, 07:16 PM

Pardon, I really should qualify what I said in the second paragraph. In my case, the functions that can yield, are functions that are the entry points to the VM. So there is no calling context, and no real yield stack -- those functions control program flow, and doing a yield actually involves less work than doing a return.



May 06, 2005, 07:49 AM

I've taken a wee look at the Java bytecode way of doing things, and it seems to be a good way of doing things, though Java seems to separate local vars and the stack in some way (I could be mistaken), I'm going for the homogeneous approach. No registers for me though.

On another note, although I've tried to make the assembler pretty flexible, I think I'm not going to bother with the JIT possibility, because it's not immediately obvious how to go from instruction, [stack_index], [stack_index] to something like x86... well, it is, but reintroducing registers at this point seems next to impossible, and the code would be sub optimal.

So where machine code is needed, another assembler would be used that generates x86 or whatever from the compiler's AST. I should say from the front-end's AST, since my compiler only does parsing, checking, and high-level optimization and doesn't generate any actual code.

For the time being I'll stick to VM bytecode though, when that's done write a simple runtime library, or hooks to the cstdlib, and write some foundation classes to make it usable. Maybe this time next year I can try using something like softwire to see what it does for performance.

Michael Walter

May 06, 2005, 09:25 AM

Coroutines (yield) can be implemented in terms of general first-class continuations (call/cc). I'm talking about the latter (ex. Ruby, Scheme).

In that setting, storing activation records on the heap might make sense (surely simplifies copying the stack + saves memory).

This thread contains 15 messages.
Hosting by Solid Eight Studios, maker of PhotoTangler Collage Maker.