The vm520 Virtual Machine

The CS520 Virtual Machine (VM) is used to define and execute 32-bit integer and floating-point computations.

The VM is a multiprocessor VM, where the number of processors is defined at VM start-up.

The VM uses the two's-complement encoding for representing integer values and IEEE single-precision floating-point for floating-point values.

The VM contains a single memory shared among all processors of 1,048,576 32-bit words. Therefore addresses are 20 bits long. The memory is used to store instructions, data and run-time stacks. Run-time stacks grow from high memory down to low memory.

Each processor contains a set of registers:

Each processor has the following fetch-execute cycle:
  1. Fetch the instruction that begins at the address in memory given by the pc.

  2. Add one to the pc.

  3. Execute the instruction.

  4. Go to step 1.

The executing processor will halt with an error if the pc is used to fetch an instruction and the pc is out of range of the available memory. The executing processor will also halt with an error if the fetched instruction does not contain a valid opcode. Certain instructions have unused fields that are expected to contain zero bits. However, if these bits are not zero, the instruction will still execute.

All instructions are 32 bits long. The instructions are described in detail here.

There are eight different instruction formats. The encodings for these formats are described here.

Immediate mode constants and offsets are stored in two's complement form. When a constant or offset is accessed by its instruction, the value is sign extended to 32 bits. Address fields contain PC-relative addresses and are also stored in two's complement form. When an address is accessed by its instruction, the value is sign extended to 32 bits before being added to the pc to obtain the effective address. Addresses out of the range of the available memory will cause the executing processor to halt with an error.

The "call" instruction pushes the pc (return address) on the stack, pushes the contents of the fp register, assigns the contents of the sp register to the fp register, and pushes a zero on top of the stack. These stack locations mark the beginning of the "frame" for the function being called. Stack frames are used to store local variables, function arguments and intermediate results. Stack locations may be referenced as operands by providing offsets to the fp. In particular, local variables would be denoted by negative offsets and arguments (passed to the current function) would be denoted by positive offsets. An offset of 0 would denote the old fp contents. Similarly, an offset of 1 would denote the return address itself. The arguments passed to the current function would start at an offset of 2. An offset of -1 would denote the zero word pushed by the "call" instruction. This word will be used for function return values (as described below). Local variables could be allocated started at offset -2. Local variables can be allocated by subtracting from the sp register.

The "ret" instruction retrieves the return value by popping the stack, sets the fp by popping the stack, sets the pc by again popping the stack, and then stores the return value in the first local slot (at offset -1) of the calling function. The net result is to return control to the calling function, leaving the function return value in the caller's first local slot.

The calling conventions for this machine, therefore, are for the calling function to push the arguments on the stack in reverse order and then execute the "call" instruction, which will push the pc (return address) on the stack. The "call" instruction also pushes the old fp contents and establishes a new fp value for the newly called function. And it allocates a local variable to contain return values for any function calls that will be made by the called function. The called function can then subtract from the sp to allocate local variables. The called function accesses its locals using negative offsets (starting at -2, since -1 will be used for function return values) and accesses the arguments passed to it using positive offsets starting at 2. The called function is responsible for removing its local variables from the stack (by adding to the sp), pushing its return value, and executing the "ret" instruction.

Code to be executed by the VM are stored in "object" files. These files are a series of 32-bit words. These words are stored in a byte-oriented file in Little Endian fashion.

The file starts with three words that describe the length of the three sections of the file. The first word is the length of the insymbol section. The second word is the length of the outsymbol section. The third word is the length of the object-code section.

The three header words are followed by the three sections of the file, which appear in the same order as their header words: insymbols, outsymbols and object code.

The insymbol section contains a sequence of (symbol, address) pairs for those symbols defined in the object file that are being exported to the linker. A symbol being exported is limited to 16 characters and is stored in four 32-words in the insymbol section. The leftmost character of the symbol is stored in the bits 0-7 of the first word of the (symbol, address) pair. The second character of the symbol is stored in bits 8-15 of the word. The third character is stored in bits 16-23 of the word. The fourth character is stored in bits 24-31 of the word. Characters 5-8 of the symbol are stored similarly in the second word of the (symbol, address) pair. Characters 9-12 of the symbol are stored similarly in the third word of the (symbol, address) pair. Characters 13-16 of the symbol are stored similarly in the fourth word of the (symbol, address) pair. If the symbol is shorter than 16 characters then the symbol is extended with null (0) bytes before being stored in the insymbol section. The fifth word of the (symbol, address) pair contains the address of the symbol, expressed relative to the beginning of the object code. That is, the first word in the object-code section is assumed to be at address zero. Note that, since there are five words for each (symbol, address) pair, the length of the insymbol section should be a multiple of five.

The outsymbol section contains a sequence of (symbol, address) pairs for those symbols referenced but not defined in the object file. (These symbols will hopefully be located by the linker.) The structure of the outsymbol section is identical to the structure of the insymbol section. A symbol can appear multiple times in the outsymbol section, once for each reference to the symbol. The address given with the symbol is the address of the instruction referencing the symbol. This address is expressed relative to the beginning of the object code.

The object code section contains the encoded instructions, the machine code itself.

The vm520 assembler, as520, is described here.


Last modified on December 6, 2010.

Comments and questions should be directed to hatcher@unh.edu