MVM: Minimal Virtual Machine

Further pages: Bubblesort

Design goals

MVM is a virtual machine specification designed to be as simple as possible to implement. Porting MVM to a new platform should require almost no effort. Still, generating code for MVM should not be unduly complicated. MVM code may be interpreted or JIT-compiled.

Advantages

very simple specification
ultra-portable
provides a reliable code execution sandbox
VMs can be suspended, resumed, frozen, cloned, migrated and monitored at any time
perfectly suited for running untrusted mobile code

Design

MVM is based on a virtual processor with only a single instruction (OISC). The instruction I chose is subleq (subtract and branch if less or equal). Usually the instruction has only one addressing mode (mem/mem); for practical purposes, I added a second addressing mode (const/mem).

The MVM processor has no registers or flags.

MVM's memory model is an array of signed 32 bit integers, starting at offset 0. Memory size may vary. Memory expansion semantics have not been defined yet.

The subleq instruction

The MVM's single instruction has this format:

  subleq a b c

This subtracts the contents of memory location b from the contents of memory location a and stores the result in location a. If the result is zero or negative, the processor branches to target c.

b may be substituted by a constant (this is the alternative addressing mode). In this case we write:

  subleq a #b c

If no branch is desired, one may write:

  sub a b
  sub a #b

Or alternatively:

  a -= b
  a -= #b

On the byte level, this is encoded as subleq a b c where c equals the location of the next instruction (pc+3) and thus suppresses branching.

Encoding

In memory, subleq a b c is encoded as three words (a, b, c). subleq a #b c is encoded (-1-a, b, c).

Minimal interpreter

This is a fully functional MVM interpreter core loop:

  while (pc >= 0) {
    a = memory[pc++];
    b = memory[pc++];
    c = memory[pc++];
    if (a >= 0)
      memory[a] -= memory[b];   /* mode 1 */
    else
      memory[a = ~a] -= b;      /* mode 2 */
    if (memory[a] <= 0)
      pc = c;
  }

Add some I/O and voila.

Z

A special memory location called Z is reserved for containing the value 0 (typically, the location of Z is also 0). Temporarily, Z may contain a different value. It is assumed that any code block that touches Z resets it to 0 afterwards.

Common macros

Let's look at how some standard CPU instructions can be reproduced in MVM.

Copy

Copy a value from one location to another.

Shorthand:

  DST = SRC

Implementation:

  Z -= SRC
  DST -= DST
  DST -= Z
  Z -= Z

Alternative notation:

  Z -= SRC
  clear DST
  DST -= Z
  clear Z

Add

Shorthand:

  A += B

Implementation:

  Z -= B
  A -= Z
  Z -= Z

Indirect read

Shorthand:

  A = [X]

Implementation:

  L = X
  Z -= L:*
  A -= A
  A -= Z
  Z -= Z

Accessing a location that is known only at runtime requires self-modifying code in MVM. In the example, L:* assigns the label L to the instruction's second operand (i.e. the second word in the triple). The actual value is set by the (macro-) instruction L = X.

Indirect store

Shorthand:

  [X] = A

An indirect store requires a bit more patching than an indirect read.

Implementation:

  L1 = X
  L2 = X
  L3 = X
  L1:* -= L2:*
  Z -= A
  L3:* -= Z
  Z -= Z

(Side note: The expansion of the three copy instructions can be simplified to 8 subleqs instead of 12.)